Commit graph

3455 commits

Author SHA1 Message Date
Andre Oppermann
a9c92b54a9 In the case the destination of a packet was changed by the packet filter
to point to a local IP address; and the packet was sourced from this host
we fill in the m_pkthdr.rcvif with a pointer to the loopback interface.

Before the function ifunit("lo0") was used to obtain the ifp.  However
this is sub-optimal from a performance point of view and might be dangerous
if the loopback interface has been renamed.  Use the global variable 'loif'
instead which always points to the loopback interface.

Submitted by:	brooks
2004-08-27 15:39:34 +00:00
Andre Oppermann
319c4c256a Remove a junk line left over from the recent IPFW to PFIL_HOOKS conversion. 2004-08-27 15:32:28 +00:00
Andre Oppermann
c21fd23260 Always compile PFIL_HOOKS into the kernel and remove the associated kernel
compile option.  All FreeBSD packet filters now use the PFIL_HOOKS API and
thus it becomes a standard part of the network stack.

If no hooks are connected the entire packet filter hooks section and related
activities are jumped over.  This removes any performance impact if no hooks
are active.

Both OpenBSD and DragonFlyBSD have integrated PFIL_HOOKS permanently as well.
2004-08-27 15:16:24 +00:00
Ruslan Ermilov
9bfe6d472a Revert the last change to sys/modules/ipfw/Makefile and fix a
standalone module build in a better way.

Silence from:	andre
MFC after:	3 days
2004-08-26 14:18:30 +00:00
Pawel Jakub Dawidek
a7f3feff1b Allocate memory when dumping pipes with M_WAITOK flag.
On a system with huge number of pipes, M_NOWAIT failes almost always,
because of memory fragmentation.
My fix is different than the patch proposed by Pawel Malachowski,
because in FreeBSD 5.x we cannot sleep while holding dummynet mutex
(in 4.x there is no such lock).
My fix is also ugly, but there is no easy way to prepare nice and clean fix.

PR:		kern/46557
Submitted by:	Eugene Grosbein <eugen@grosbein.pp.ru>
Reviewed by:	mlaier
2004-08-25 09:31:30 +00:00
Max Laier
ca7a789aa6 Allow early drop for non-ALTQ enabled queues in an ALTQ-enabled kernel.
Previously the early drop was disabled unconditionally for ALTQ-enabled
kernels.

This should give some benefit for the normal gateway + LAN-server case with
a busy LAN leg and an ALTQ managed uplink.

Reviewed and style help from:	cperciva, pjd
2004-08-22 16:42:28 +00:00
Robert Watson
392e840716 When sliding the m_data pointer forward, update m_pktrhdr.len as well
as m_len, or the pkthdr length will be inconsistent with the actual
length of data in the mbuf chain.  The symptom of this occuring was
"out of data" warnings from in_cksum_skip() on large UDP packets sent
via the loopback interface.

Foot shot:	green
2004-08-22 01:32:48 +00:00
Christian S.J. Peron
5090559b7f When a prison is given the ability to create raw sockets (when the
security.jail.allow_raw_sockets sysctl MIB is set to 1) where privileged
access to jails is given out, it is possible for prison root to manipulate
various network parameters which effect the host environment. This commit
plugs a number of security holes associated with the use of raw sockets
and prisons.

This commit makes the following changes:

- Add a comment to rtioctl warning developers that if they add
  any ioctl commands, they should use super-user checks where necessary,
  as it is possible for PRISON root to make it this far in execution.
- Add super-user checks for the execution of the SIOCGETVIFCNT
  and SIOCGETSGCNT IP multicast ioctl commands.
- Add a super-user check to rip_ctloutput(). If the calling cred
  is PRISON root, make sure the socket option name is IP_HDRINCL,
  otherwise deny the request.

Although this patch corrects a number of security problems associated
with raw sockets and prisons, the warning in jail(8) should still
apply, and by default we should keep the default value of
security.jail.allow_raw_sockets MIB to 0 (or disabled) until
we are certain that we have tracked down all the problems.

Looking forward, we will probably want to eliminate the
references to curthread.

This may be a MFC candidate for RELENG_5.

Reviewed by:	rwatson
Approved by:	bmilekic (mentor)
2004-08-21 17:38:57 +00:00
Robert Watson
e6ccd70936 When prepending space onto outgoing UDP datagram payloads to hold the
UDP/IP header, make sure that space is also allocated for the link
layer header.  If an mbuf must be allocated to hold the UDP/IP header
(very likely), then this will avoid an additional mbuf allocation at
the link layer.  This trick is also used by TCP and other protocols to
avoid extra calls to the mbuf allocator in the ethernet (and related)
output routines.
2004-08-21 16:14:04 +00:00
Andre Oppermann
ce63226177 Fix a stupid typo which prevented an ipfw KLD unload from successfully cleaning
up its remains.  Do not terminate 'if' lines with ';'.

Spotted by:	claudio@OpenBSD.ORG (sitting 3m from my desk)
Pointy hat to:	andre
2004-08-20 00:36:55 +00:00
Andre Oppermann
70222723f3 When unloading ipfw module use callout_drain() to make absolutely sure that
all callouts are stopped and finished.  Move it before IPFW_LOCK() to avoid
deadlocking when draining callouts.
2004-08-19 23:31:40 +00:00
Andre Oppermann
6f2d4ea6f8 For IPv6 access pointer to tcpcb only after we have checked it is valid.
Found by:	Coverity's automated analysis (via Ted Unangst)
2004-08-19 20:16:17 +00:00
Andre Oppermann
50ab727669 Give a useful error message if someone tries to compile IPFIREWALL into the
kernel without specifying PFIL_HOOKS as well.
2004-08-19 18:38:23 +00:00
Andre Oppermann
9108601915 Do not unconditionally ignore IPDIVERT and IPFIREWALL_FORWARD when building
the ipfw KLD.

 For IPFIREWALL_FORWARD this does not have any side effects.  If the module
 has it but not the kernel it just doesn't do anything.

 For IPDIVERT the KLD will be unloadable if the kernel doesn't have IPDIVERT
 compiled in too.  However this is the least disturbing behaviour.  The user
 can just recompile either module or the kernel to match the other one.  The
 access to the machine is not denied if ipfw refuses to load.
2004-08-19 17:59:26 +00:00
Andre Oppermann
e4c97eff8e Bring back the sysctl 'net.inet.ip.fw.enable' to unbreak the startup scripts
and to be able to disable ipfw if it was compiled directly into the kernel.
2004-08-19 17:38:47 +00:00
Robert Watson
5c32ea6517 Push down pcbinfo and inpcb locking from udp_send() into udp_output().
This provides greater context for the locking and allows us to avoid
locking the pcbinfo structure if not binding operations will take
place (i.e., already bound, connected, and no expliti sendto()
address).
2004-08-19 01:13:10 +00:00
Robert Watson
4c2bb15a89 In in_pcbrehash(), do assert the inpcb lock as well as the pcbinfo lock. 2004-08-19 01:11:17 +00:00
Robert Watson
0f48e25b63 Fix build of ip_input.c with "options IPSEC" -- the "pass:" label
is used with both FAST_IPSEC and IPSEC, but was defined for only
FAST_IPSEC.
2004-08-18 03:11:04 +00:00
Peter Wemm
1e5cc10dc2 Make the kernel compile again if you are not using PFIL_HOOKS 2004-08-18 00:37:46 +00:00
Andre Oppermann
9b932e9e04 Convert ipfw to use PFIL_HOOKS. This is change is transparent to userland
and preserves the ipfw ABI.  The ipfw core packet inspection and filtering
functions have not been changed, only how ipfw is invoked is different.

However there are many changes how ipfw is and its add-on's are handled:

 In general ipfw is now called through the PFIL_HOOKS and most associated
 magic, that was in ip_input() or ip_output() previously, is now done in
 ipfw_check_[in|out]() in the ipfw PFIL handler.

 IPDIVERT is entirely handled within the ipfw PFIL handlers.  A packet to
 be diverted is checked if it is fragmented, if yes, ip_reass() gets in for
 reassembly.  If not, or all fragments arrived and the packet is complete,
 divert_packet is called directly.  For 'tee' no reassembly attempt is made
 and a copy of the packet is sent to the divert socket unmodified.  The
 original packet continues its way through ip_input/output().

 ipfw 'forward' is done via m_tag's.  The ipfw PFIL handlers tag the packet
 with the new destination sockaddr_in.  A check if the new destination is a
 local IP address is made and the m_flags are set appropriately.  ip_input()
 and ip_output() have some more work to do here.  For ip_input() the m_flags
 are checked and a packet for us is directly sent to the 'ours' section for
 further processing.  Destination changes on the input path are only tagged
 and the 'srcrt' flag to ip_forward() is set to disable destination checks
 and ICMP replies at this stage.  The tag is going to be handled on output.
 ip_output() again checks for m_flags and the 'ours' tag.  If found, the
 packet will be dropped back to the IP netisr where it is going to be picked
 up by ip_input() again and the directly sent to the 'ours' section.  When
 only the destination changes, the route's 'dst' is overwritten with the
 new destination from the forward m_tag.  Then it jumps back at the route
 lookup again and skips the firewall check because it has been marked with
 M_SKIP_FIREWALL.  ipfw 'forward' has to be compiled into the kernel with
 'option IPFIREWALL_FORWARD' to enable it.

 DUMMYNET is entirely handled within the ipfw PFIL handlers.  A packet for
 a dummynet pipe or queue is directly sent to dummynet_io().  Dummynet will
 then inject it back into ip_input/ip_output() after it has served its time.
 Dummynet packets are tagged and will continue from the next rule when they
 hit the ipfw PFIL handlers again after re-injection.

 BRIDGING and IPFW_ETHER are not changed yet and use ipfw_chk() directly as
 they did before.  Later this will be changed to dedicated ETHER PFIL_HOOKS.

More detailed changes to the code:

 conf/files
	Add netinet/ip_fw_pfil.c.

 conf/options
	Add IPFIREWALL_FORWARD option.

 modules/ipfw/Makefile
	Add ip_fw_pfil.c.

 net/bridge.c
	Disable PFIL_HOOKS if ipfw for bridging is active.  Bridging ipfw
	is still directly invoked to handle layer2 headers and packets would
	get a double ipfw when run through PFIL_HOOKS as well.

 netinet/ip_divert.c
	Removed divert_clone() function.  It is no longer used.

 netinet/ip_dummynet.[ch]
	Neither the route 'ro' nor the destination 'dst' need to be stored
	while in dummynet transit.  Structure members and associated macros
	are removed.

 netinet/ip_fastfwd.c
	Removed all direct ipfw handling code and replace it with the new
	'ipfw forward' handling code.

 netinet/ip_fw.h
	Removed 'ro' and 'dst' from struct ip_fw_args.

 netinet/ip_fw2.c
	(Re)moved some global variables and the module handling.

 netinet/ip_fw_pfil.c
	New file containing the ipfw PFIL handlers and module initialization.

 netinet/ip_input.c
	Removed all direct ipfw handling code and replace it with the new
	'ipfw forward' handling code.  ip_forward() does not longer require
	the 'next_hop' struct sockaddr_in argument.  Disable early checks
	if 'srcrt' is set.

 netinet/ip_output.c
	Removed all direct ipfw handling code and replace it with the new
	'ipfw forward' handling code.

 netinet/ip_var.h
	Add ip_reass() as general function.  (Used from ipfw PFIL handlers
	for IPDIVERT.)

 netinet/raw_ip.c
	Directly check if ipfw and dummynet control pointers are active.

 netinet/tcp_input.c
	Rework the 'ipfw forward' to local code to work with the new way of
	forward tags.

 netinet/tcp_sack.c
	Remove include 'opt_ipfw.h' which is not needed here.

 sys/mbuf.h
	Remove m_claim_next() macro which was exclusively for ipfw 'forward'
	and is no longer needed.

Approved by:	re (scottl)
2004-08-17 22:05:54 +00:00
Robert Watson
a4f757cd5d White space cleanup for netinet before branch:
- Trailing tab/space cleanup
- Remove spurious spaces between or before tabs

This change avoids touching files that Andre likely has in his working
set for PFIL hooks changes for IPFW/DUMMYNET.

Approved by:	re (scottl)
Submitted by:	Xin LI <delphij@frontfree.net>
2004-08-16 18:32:07 +00:00
David E. O'Brien
5af87d0ea1 Put the 'antispoof' opcode in the proper place in the opcode list such
that it doesn't break the ipfw2 ABI.
2004-08-16 12:05:19 +00:00
David Malone
1f44b0a1b5 Get rid of the RANDOM_IP_ID option and make it a sysctl. NetBSD
have already done this, so I have styled the patch on their work:

        1) introduce a ip_newid() static inline function that checks
        the sysctl and then decides if it should return a sequential
        or random IP ID.

        2) named the sysctl net.inet.ip.random_id

        3) IPv6 flow IDs and fragment IDs are now always random.
        Flow IDs and frag IDs are significantly less common in the
        IPv6 world (ie. rarely generated per-packet), so there should
        be smaller performance concerns.

The sysctl defaults to 0 (sequential IP IDs).

Reviewed by:	andre, silby, mlaier, ume
Based on:	NetBSD
MFC after:	2 months
2004-08-14 15:32:40 +00:00
Poul-Henning Kamp
e7581f0fc2 Fix outgoing ICMP on global instance. 2004-08-14 14:21:09 +00:00
Christian S.J. Peron
31c88a3043 Add the ability to associate ipfw rules with a specific prison ID.
Since the only thing truly unique about a prison is it's ID, I figured
this would be the most granular way of handling this.

This commit makes the following changes:

- Adds tokenizing and parsing for the ``jail'' command line option
  to the ipfw(8) userspace utility.
- Append the ipfw opcode list with O_JAIL.
- While Iam here, add a comment informing others that if they
  want to add additional opcodes, they should append them to the end
  of the list to avoid ABI breakage.
- Add ``fw_prid'' to the ipfw ucred cache structure.
- When initializing ucred cache, if the process is jailed,
  set fw_prid to the prison ID, otherwise set it to -1.
- Update man page to reflect these changes.

This change was a strong motivator behind the ucred caching
mechanism in ipfw.

A sample usage of this new functionality could be:

    ipfw add count ip from any to any jail 2

It should be noted that because ucred based constraints
are only implemented for TCP and UDP packets, the same
applies for jail associations.

Conceptual head nod by:	pjd
Reviewed by:	rwatson
Approved by:	bmilekic (mentor)
2004-08-12 22:06:55 +00:00
David Malone
849112666a In tcp6_ctlinput, lock tcbinfo around the call to syncache_unreach
so that the locks held are the same as the IPv4 case.

Reviewed by:	rwatson
2004-08-12 18:19:36 +00:00
Andre Oppermann
9d804f818c Fix two cases of incorrect IPQ_UNLOCK'ing in the merged ip_reass() function.
The first one was going to 'dropfrag', which unlocks the IPQ, before the lock
was aquired; The second one doing a unlock and then a 'goto dropfrag' which
led to a double-unlock.

Tripped over by:	des
2004-08-12 08:37:42 +00:00
Robert Watson
c19c5239a6 When udp_send() fails, make sure to free the control mbufs as well as
the data mbuf.  This was done in most error cases, but not the case
where the inpcb pointer is surprisingly NULL.
2004-08-12 01:34:27 +00:00
Andre Oppermann
420a281164 Backout removal of UMA_ZONE_NOFREE flag for all zones which are established
for structures with timers in them.  It might be that a timer might fire
even when the associated structure has already been free'd.  Having type-
stable storage in this case is beneficial for graceful failure handling and
debugging.

Discussed with:	bosko, tegge, rwatson
2004-08-11 20:30:08 +00:00
Andre Oppermann
4efb805c0c Remove the UMA_ZONE_NOFREE flag to all uma_zcreate() calls in the IP and
TCP code.  This flag would have prevented giving back excessive free slabs
to the global pool after a transient peak usage.
2004-08-11 17:08:31 +00:00
Andre Oppermann
67d0b24ed1 Make use of in_localip() function and replace previous direct LIST_FOREACH
loops over INADDR_HASH.
2004-08-11 12:32:10 +00:00
Andre Oppermann
2eccc90b61 Add the function in_localip() which returns 1 if an internet address is for
the local host and configured on one of its interfaces.
2004-08-11 11:49:48 +00:00
Andre Oppermann
6e234ede37 Only invoke verify_path() for verrevpath and versrcreach when we have an IP packet. 2004-08-11 11:41:11 +00:00
Andre Oppermann
767981878c Only check for local broadcast addresses if the mbuf is flagged with M_BCAST. 2004-08-11 10:49:56 +00:00
Andre Oppermann
0b17fba7bc Consistently use NULL for pointer comparisons. 2004-08-11 10:46:15 +00:00
Andre Oppermann
de2e5d1e20 Make IP fastforwarding ALTQ-aware by adding the input traffic conditioner
check and disabling the early output interface queue length check.
2004-08-11 10:42:59 +00:00
Andre Oppermann
2f6e6e9b4c Correct the displayed bandwidth calculation for a readout via sysctl. The
saved value does not have to be scaled with HZ; it is already in bytes per
second.  Only the multiply by eight remains to show bits per second (bps).
2004-08-11 10:12:16 +00:00
Robert Watson
27f74fd0ed Assert the locks of inpcbinfo's and inpcb's passed into in_pcbconnect()
and in_pcbconnect_setup(), since these functions frob the port and
address state of inpcbs.
2004-08-11 04:35:20 +00:00
Andre Oppermann
bb7c5b3055 Make a comment that IP source routing is not SMP and PREEMPTION safe. 2004-08-09 16:17:37 +00:00
Andre Oppermann
a5053398d4 Make a comment that "ipfw forward" is not SMP and PREEMPTION safe. 2004-08-09 16:16:10 +00:00
Andre Oppermann
5f9541ecbd New ipfw option "antispoof":
For incoming packets, the packet's source address is checked if it
 belongs to a directly connected network.  If the network is directly
 connected, then the interface the packet came on in is compared to
 the interface the network is connected to.  When incoming interface
 and directly connected interface are not the same, the packet does
 not match.

Usage example:

 ipfw add deny ip from any to any not antispoof in

Manpage education by:	ru
2004-08-09 16:12:10 +00:00
Robert Watson
f31f65a708 Pass pcbinfo structures to in6_pcbnotify() rather than pcbhead
structures, allowing in6_pcbnotify() to lock the pcbinfo and each
inpcb that it notifies of ICMPv6 events.  This prevents inpcb
assertions from firing when IPv6 generates and delievers event
notifications for inpcbs.

Reported by:	kuriyama
Tested by:	kuriyama
2004-08-06 03:45:45 +00:00
Robert Watson
9c1df6951f When iterating the UDP inpcb list processing an inbound broadcast
or multicast packet, we don't need to acquire the inpcb mutex
unless we are actually using inpcb fields other than the bound port
and address.  Since we hold the pcbinfo lock already, these can't
change.  Defer acquiring the inpcb mutex until we have a high
chance of a match.  This avoids about 120 mutex operations per UDP
broadcast packet received on one of my work systems.

Reviewed by:	sam
2004-08-06 02:08:31 +00:00
Robert Watson
98aed8ca56 Now that IPv6 performs basic in6pcb and inpcb locking, enable inpcb
lock assertions even if IPv6 is compiled into the kernel.  Previously,
inclusion of IPv6 and locking assertions would result in a rapid
assertion failure as IPv6 was not properly locking inpcbs.
2004-08-04 18:27:55 +00:00
Joe Marcus Clarke
5c7e7e80cc Fix Skinny and PPTP NAT'ing after the introduction of the {ip,tcp,udp}_next
functions.  Basically, the ip_next() function was used to get the PPTP and
Skinny headers when tcp_next() should have been used instead.  Symptoms of
this included a segfault in natd when trying to process a PPTP or Skinny
packet.

Approved by:	des
2004-08-04 15:17:08 +00:00
Andre Oppermann
81007fd4eb o Delayed checksums are now calculated in divert_packet() for diverted packets
Remove the XXX-escaped code that did it in ip_output()'s IPHACK section.
2004-08-03 14:13:36 +00:00
Andre Oppermann
24a098ea9b o Move the inflight sysctls to their own sub-tree under net.inet.tcp to be
more consistent with the other sysctls around it.
2004-08-03 13:54:11 +00:00
Andre Oppermann
f0cada84b1 o Move all parts of the IP reassembly process into the function ip_reass() to
make it fully self-contained.
o ip_reass() now returns a new mbuf with the reassembled packet and ip->ip_len
  including the IP header.
o Computation of the delayed checksum is moved into divert_packet().

Reviewed by:	silby
2004-08-03 12:31:38 +00:00
Jeffrey Hsu
2ff39e1543 Fix bug with tracking the previous element in a list.
Found by:	edrt@citiz.net
Submitted by:	pavlin@icir.org
2004-08-03 02:01:44 +00:00
Yaroslav Tykhiy
a4eb4405e3 Disallow a particular kind of port theft described by the following scenario:
Alice is too lazy to write a server application in PF-independent
	manner.  Therefore she knocks up the server using PF_INET6 only
	and allows the IPv6 socket to accept mapped IPv4 as well.  An evil
	hacker known on IRC as cheshire_cat has an account in the same
	system.  He starts a process listening on the same port as used
	by Alice's server, but in PF_INET.  As a consequence, cheshire_cat
	will distract all IPv4 traffic supposed to go to Alice's server.

Such sort of port theft was initially enabled by copying the code that
implemented the RFC 2553 semantics on IPv4/6 sockets (see inet6(4)) for
the implied case of the same owner for both connections.  After this
change, the above scenario will be impossible.  In the same setting,
the user who attempts to start his server last will get EADDRINUSE.

Of course, using IPv4 mapped to IPv6 leads to security complications
in the first place, but there is no reason to make it even more unsafe.

This change doesn't apply to KAME since it affects a FreeBSD-specific
part of the code.  It doesn't modify the out-of-box behaviour of the
TCP/IP stack either as long as mapping IPv4 to IPv6 is off by default.

MFC after:	1 month
2004-07-28 13:03:07 +00:00
Jayanth Vijayaraghavan
5d3b1b7556 Fix a bug in the sack code that was causing data to be retransmitted
with the FIN bit set for all segments, if a FIN has already been sent before.
The fix will allow the FIN bit to be set for only the last segment, in case
it has to be retransmitted.

Fix another bug that would have caused snd_nxt to be pulled by len if
there was an error from ip_output. snd_nxt should not be touched
during sack retransmissions.
2004-07-28 02:15:14 +00:00
Jayanth Vijayaraghavan
e9f2f80e09 Fix for a SACK bug where the very last segment retransmitted
from the SACK scoreboard could result in the next (untransmitted)
segment to be skipped.
2004-07-26 23:41:12 +00:00
John-Mark Gurney
0aa8ce5012 compare pointer against NULL, not 0
when inpcb is NULL, this is no longer invalid since jlemon added the
tcp_twstart function... this prevents close "failing" w/ EINVAL when it
really was successful...

Reviewed by:	jeremy (NetBSD)
2004-07-26 21:29:56 +00:00
Colin Percival
56f21b9d74 Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is
somewhat clearer, but more importantly allows for a consistent naming
scheme for suser_cred flags.

The old name is still defined, but will be removed in a few days (unless I
hear any complaints...)

Discussed with:	rwatson, scottl
Requested by:	jhb
2004-07-26 07:24:04 +00:00
Andre Oppermann
55db762b76 Extend versrcreach by checking against the rt_flags for RTF_REJECT and
RTF_BLACKHOLE as well.

To quote the submitter:

 The uRPF loose-check implementation by the industry vendors, at least on Cisco
 and possibly Juniper, will fail the check if the route of the source address
 is pointed to Null0 (on Juniper, discard or reject route). What this means is,
 even if uRPF Loose-check finds the route, if the route is pointed to blackhole,
 uRPF loose-check must fail. This allows people to utilize uRPF loose-check mode
 as a pseudo-packet-firewall without using any manual filtering configuration --
 one can simply inject a IGP or BGP prefix with next-hop set to a static route
 that directs to null/discard facility. This results in uRPF Loose-check failing
 on all packets with source addresses that are within the range of the nullroute.

Submitted by:	James Jun <james@towardex.com>
2004-07-21 19:55:14 +00:00
Robert Watson
2d01d331c6 M_PREPEND() the IP header on to the front of an outgoing raw IP packet
using M_DONTWAIT rather than M_WAITOK to avoid sleeping on memory
while holding a mutex.
2004-07-20 20:52:30 +00:00
Jayanth Vijayaraghavan
04f0d9a0ea Let IN_FASTREOCOVERY macro decide if we are in recovery mode.
Nuke sackhole_limit for now. We need to add it back to limit the total
number of sack blocks in the system.
2004-07-19 22:37:33 +00:00
Jayanth Vijayaraghavan
f787edd847 Fix a potential panic in the SACK code that was causing
1) data to be sent to the right of snd_recover.
2) send more data then whats in the send buffer.

The fix is to postpone sack retransmit to a subsequent recovery episode
if the current retransmit pointer is beyond snd_recover.

Thanks to Mohan Srinivasan for helping fix the bug.

Submitted by:Daniel Lang
2004-07-19 22:06:01 +00:00
David Malone
932312d60b Fix the !INET6 build.
Reported by:	alc
2004-07-17 21:40:14 +00:00
David Malone
969860f3ed The tcp syncache code was leaving the IPv6 flowlabel uninitialised
for the SYN|ACK packet and then letting in6_pcbconnect set the
flowlabel later. Arange for the syncache/syncookie code to set and
recall the flow label so that the flowlabel used for the SYN|ACK
is consistent. This is done by using some of the cookie (when tcp
cookies are enabeled) and by stashing the flowlabel in syncache.

Tested and Discovered by:	Orla McGann <orly@cnri.dit.ie>
Approved by:			ume, silby
MFC after:			1 month
2004-07-17 19:44:13 +00:00
Max Laier
c550f2206d Define semantic of M_SKIP_FIREWALL more precisely, i.e. also pass associated
icmp_error() packets. While here retire PACKET_TAG_PF_GENERATED (which
served the same purpose) and use M_SKIP_FIREWALL in pf as well. This should
speed up things a bit as we get rid of the tag allocations.

Discussed with:	juli
2004-07-17 05:10:06 +00:00
Juli Mallett
765d141c78 Make M_SKIP_FIREWALL a global (and semantic) flag, preventing anything from
using M_PROTO6 and possibly shooting someone's foot, as well as allowing the
firewall to be used in multiple passes, or with a packet classifier frontend,
that may need to explicitly allow a certain packet.  Presently this is handled
in the ipfw_chk code as before, though I have run with it moved to upper
layers, and possibly it should apply to ipfilter and pf as well, though this
has not been investigated.

Discussed with:	luigi, rwatson
2004-07-17 02:40:13 +00:00
Hajimu UMEMOTO
8a59da300c when IN6P_AUTOFLOWLABEL is set, the flowlabel is not set on
outgoing tcp connections.

Reported by:	Orla McGann <orly@cnri.dit.ie>
Reviewed by:	Orla McGann <orly@cnri.dit.ie>
Obtained from:	KAME
2004-07-16 18:08:13 +00:00
Poul-Henning Kamp
3e019deaed Do a pass over all modules in the kernel and make them return EOPNOTSUPP
for unknown events.

A number of modules return EINVAL in this instance, and I have left
those alone for now and instead taught MOD_QUIESCE to accept this
as "didn't do anything".
2004-07-15 08:26:07 +00:00
Stefan Farfeleder
439dfb0c35 Remove erroneous semicolons. 2004-07-13 16:06:19 +00:00
Robert Watson
7cfc690440 After each label in tcp_input(), assert the inpcbinfo and inpcb lock
state that we expect.
2004-07-12 19:28:07 +00:00
Brian Somers
0ac4013324 Change the following environment variables to kernel options:
bootp -> BOOTP
    bootp.nfsroot -> BOOTP_NFSROOT
    bootp.nfsv3 -> BOOTP_NFSV3
    bootp.compat -> BOOTP_COMPAT
    bootp.wired_to -> BOOTP_WIRED_TO

- i.e. back out the previous commit.  It's already possible to
pxeboot(8) with a GENERIC kernel.

Pointed out by: dwmalone
2004-07-08 22:35:36 +00:00
Brian Somers
59e1ebc9b5 Change the following kernel options to environment variables:
BOOTP -> bootp
    BOOTP_NFSROOT -> bootp.nfsroot
    BOOTP_NFSV3 -> bootp.nfsv3
    BOOTP_COMPAT -> bootp.compat
    BOOTP_WIRED_TO -> bootp.wired_to

This lets you PXE boot with a GENERIC kernel by putting this sort of thing
in loader.conf:

    bootp="YES"
    bootp.nfsroot="YES"
    bootp.nfsv3="YES"
    bootp.wired_to="bge1"

or even setting the variables manually from the OK prompt.
2004-07-08 13:40:33 +00:00
Dag-Erling Smørgrav
de47739e71 Push WARNS back up to 6, but define NO_WERROR; I want the warts out in the
open where people can see them and hopefully fix them.
2004-07-06 12:15:24 +00:00
Dag-Erling Smørgrav
9fa0fd2682 Introduce inline {ip,udp,tcp}_next() functions which take a pointer to an
{ip,udp,tcp} header and return a void * pointing to the payload (i.e. the
first byte past the end of the header and any required padding).  Use them
consistently throughout libalias to a) reduce code duplication, b) improve
code legibility, c) get rid of a bunch of alignment warnings.
2004-07-06 12:13:28 +00:00
Dag-Erling Smørgrav
e3e2c21639 Rewrite twowords() to access its argument through a char pointer and not
a short pointer.  The previous implementation seems to be in a gray zone
of the C standard, and GCC generates incorrect code for it at -O2 or
higher on some platforms.
2004-07-06 09:22:18 +00:00
Dag-Erling Smørgrav
95347a8ee0 Temporarily lower WARNS to 3 while I figure out the alignment issues on
alpha.
2004-07-06 08:44:41 +00:00
Dag-Erling Smørgrav
ed01a58215 Make libalias WARNS?=6-clean. This mostly involves renaming variables
named link, foo_link or link_foo to lnk, foo_lnk or lnk_foo, fixing
signed / unsigned comparisons, and shoving unused function arguments
under the carpet.

I was hoping WARNS?=6 might reveal more serious problems, and perhaps
the source of the -O2 breakage, but found no smoking gun.
2004-07-05 11:10:57 +00:00
Dag-Erling Smørgrav
ffcb611a9d Parenthesize return values. 2004-07-05 10:55:23 +00:00
Dag-Erling Smørgrav
f311ebb4ec Mechanical whitespace cleanup. 2004-07-05 10:53:28 +00:00
Poul-Henning Kamp
e6bbb69149 Add LibAliasOutTry() which checks a packet for a hit in the tables, but
does not create a new entry if none is found.
2004-07-04 12:53:07 +00:00
Ruslan Ermilov
1a0a934547 Mechanically kill hard sentence breaks. 2004-07-02 23:52:20 +00:00
Jayanth Vijayaraghavan
a0445c2e2c On receiving 3 duplicate acknowledgements, SACK recovery was not being entered correctly.
Fix this problem by separating out the SACK and the newreno cases. Also, check
if we are in FASTRECOVERY for the sack case and if so, turn off dupacks.

Fix an issue where the congestion window was not being incremented by ssthresh.

Thanks to Mohan Srinivasan for finding this problem.
2004-07-01 23:34:06 +00:00
Ruslan Ermilov
c9a246418d Bumped document date.
Fixed markup.
Fixed examples to match the new API.
2004-07-01 17:51:48 +00:00
Poul-Henning Kamp
e3e244bff6 Rwatson, write 100 times for tomorrow:
First unlock, then assign NULL to pointer.
2004-06-27 21:54:34 +00:00
Pawel Jakub Dawidek
0a44517d3a Those are unneeded too. 2004-06-27 09:06:10 +00:00
Pawel Jakub Dawidek
46e3b1cbe7 Add two missing includes and remove two uneeded.
This is quite serious fix, because even with MAC framework compiled in,
MAC entry points in those two files were simply ignored.
2004-06-27 09:03:22 +00:00
Robert Watson
1e4d7da707 Reduce the number of unnecessary unlock-relocks on socket buffer mutexes
associated with performing a wakeup on the socket buffer:

- When performing an sbappend*() followed by a so[rw]wakeup(), explicitly
  acquire the socket buffer lock and use the _locked() variants of both
  calls.  Note that the _locked() sowakeup() versions unlock the mutex on
  return.  This is done in uipc_send(), divert_packet(), mroute
  socket_send(), raw_append(), tcp_reass(), tcp_input(), and udp_append().

- When the socket buffer lock is dropped before a sowakeup(), remove the
  explicit unlock and use the _locked() sowakeup() variant.  This is done
  in soisdisconnecting(), soisdisconnected() when setting the can't send/
  receive flags and dropping data, and in uipc_rcvd() which adjusting
  back-pressure on the sockets.

For UNIX domain sockets running mpsafe with a contention-intensive SMP
mysql benchmark, this results in a 1.6% query rate improvement due to
reduce mutex costs.
2004-06-26 19:10:39 +00:00
Robert Watson
3f9d1ef905 Remove spl's from TCP protocol entry points. While not all locking
is merged here yet, this will ease the merge process by bringing the
locked and unlocked versions into sync.
2004-06-26 17:50:50 +00:00
Paul Saab
652178a12a White space & spelling fixes
Submitted by:	Xin LI <delphij@frontfree.net>
2004-06-25 04:11:26 +00:00
Bruce M Simpson
37332f049f Whitespace. 2004-06-25 02:29:58 +00:00
Robert Watson
5905999b2f Broaden scope of the socket buffer lock when processing an ACK so that
the read and write of sb_cc are atomic.  Call sbdrop_locked() instead
of sbdrop() since we already hold the socket buffer lock.
2004-06-24 03:07:27 +00:00
Robert Watson
927c5cea3f Protect so_oobmark with with SOCKBUF_LOCK(&so->so_rcv), and broaden
locking in tcp_input() for TCP packets with urgent data pointers to
hold the socket buffer lock across testing and updating oobmark
from just protecting sb_state.

Update socket locking annotations
2004-06-24 02:57:12 +00:00
Robert Watson
a138d21769 In ip_ctloutput(), acquire the inpcb lock around some of the basic
inpcb flag and status updates.
2004-06-24 02:05:47 +00:00
Robert Watson
d67ec3dd48 When asserting non-Giant locks in the network stack, also assert
Giant if debug.mpsafenet=0, as any points that require synchronization
in the SMPng world also required it in the Giant-world:

- inpcb locks (including IPv6)
- inpcbinfo locks (including IPv6)
- dummynet subsystem lock
- ipfw2 subsystem lock
2004-06-24 02:01:48 +00:00
Robert Watson
3f11a2f374 Introduce sbreserve_locked(), which asserts the socket buffer lock on
the socket buffer having its limits adjusted.  sbreserve() now acquires
the lock before calling sbreserve_locked().  In soreserve(), acquire
socket buffer locks across read-modify-writes of socket buffer fields,
and calls into sbreserve/sbrelease; make sure to acquire in keeping
with the socket buffer lock order.  In tcp_mss(), acquire the socket
buffer lock in the calling context so that we have atomic read-modify
-write on buffer sizes.
2004-06-24 01:37:04 +00:00
Paul Saab
76947e3222 Move the sack sysctl's under net.inet.tcp.sack
net.inet.tcp.do_sack -> net.inet.tcp.sack.enable
net.inet.tcp.sackhole_limit -> net.inet.tcp.sack.sackhole_limit

Requested by:	wollman
2004-06-23 21:34:07 +00:00
Paul Saab
6d90faf3d8 Add support for TCP Selective Acknowledgements. The work for this
originated on RELENG_4 and was ported to -CURRENT.

The scoreboarding code was obtained from OpenBSD, and many
of the remaining changes were inspired by OpenBSD, but not
taken directly from there.

You can enable/disable sack using net.inet.tcp.do_sack. You can
also limit the number of sack holes that all senders can have in
the scoreboard with net.inet.tcp.sackhole_limit.

Reviewed by:	gnn
Obtained from:	Yahoo! (Mohan Srinivasan, Jayanth Vijayaraghavan)
2004-06-23 21:04:37 +00:00
Robert Watson
bb7479a613 Acquire socket lock around frobbing of socket state in divert sockets. 2004-06-22 04:00:51 +00:00
Robert Watson
ffcbc0e4c5 Prefer use of the inpcb as a MAC label source for outgoing packets sent
via divert sockets, when available.
2004-06-22 03:58:50 +00:00
Robert Watson
d330008e3b If debug.mpsafenet is set, initialize TCP callouts as CALLOUT_MPSAFE. 2004-06-20 21:44:50 +00:00
Robert Watson
1f82efb3b7 Assert the inpcb lock before letting MAC check whether we can deliver
to the inpcb in tcp_input().
2004-06-20 20:17:29 +00:00
Robert Watson
1b83216eda IP multicast code no longer needs to acquire Giant before appending
an mbuf onto a socket buffer.  This is left over from debug.mpsafenet
affecting the forwarding/bridging plane only.
2004-06-20 20:10:05 +00:00
Robert Watson
4e397bc524 In tcp_ctloutput(), don't hold the inpcb lock over a call to
ip_ctloutput(), as it may need to perform blocking memory allocations.
This also improves consistency with locking relative to other points
that call into ip_ctloutput().

Bumped into by:	Grover Lines <grover@ceribus.net>
2004-06-18 20:22:21 +00:00
Bruce M Simpson
4f450ff9a5 Check that m->m_pkthdr.rcvif is not NULL before checking if a packet
was received on a broadcast address on the input path. Under certain
circumstances this could result in a panic, notably for locally-generated
packets which do not have m_pkthdr.rcvif set.

This is a similar situation to that which is solved by
src/sys/netinet/ip_icmp.c rev 1.66.

PR:		kern/52935
2004-06-18 12:58:45 +00:00
Bruce M Simpson
f3e0b7ef7f Appease GCC. 2004-06-18 09:53:58 +00:00
Bruce M Simpson
5214cb3f59 If SO_DEBUG is enabled for a TCP socket, and a received segment is
encapsulated within an IPv6 datagram, do not abuse the 'ipov' pointer
when registering trace records.  'ipov' is specific to IPv4, and
will therefore be uninitialized.

[This fandango is only necessary in the first place because of our
host-byte-order IP field pessimization.]

PR:		kern/60856
Submitted by:	Galois Zheng
2004-06-18 03:31:07 +00:00
Bruce M Simpson
da181cc144 Don't set FIN on a retransmitted segment after a FIN has been sent,
unless the segment really contains the last of the data for the stream.

PR:		kern/34619
Obtained from:	OpenBSD (tcp_output.c rev 1.47)
Noticed by:	Joseph Ishac
Reviewed by:	George Neville-Neil
2004-06-18 02:47:59 +00:00
Bruce M Simpson
27de0135ce Ensure that dst is bzeroed before calling rtalloc_ign(), to avoid possible
routing table corruption.

PR:		kern/40563, freebsd4/432 (KAME)
Obtained from:	NetBSD (in_gif.c rev 1.26.10.1)
Requested by:	Jean-Luc Richier
2004-06-18 02:04:07 +00:00
Max Laier
7c1fe95333 Commit pf version 3.5 and link additional files to the kernel build.
Version 3.5 brings:
 - Atomic commits of ruleset changes (reduce the chance of ending up in an
   inconsistent state).
 - A 30% reduction in the size of state table entries.
 - Source-tracking (limit number of clients and states per client).
 - Sticky-address (the flexibility of round-robin with the benefits of
   source-hash).
 - Significant improvements to interface handling.
 - and many more ...
2004-06-16 23:24:02 +00:00
Max Laier
a306c902b8 Prepare for pf 3.5 import:
- Remove pflog and pfsync modules. Things will change in such a fashion
   that there will be one module with pf+pflog that can be loaded into
   GENERIC without problems (which is what most people want). pfsync is no
   longer possible as a module.
 - Add multicast address for in-kernel multicast pfsync protocol. Protocol
   glue will follow once the import is done.
 - Add one more mbuf tag
2004-06-16 22:59:06 +00:00
Maxim Konovalov
ef14c36965 o connect(2): if there is no a route to the destination
do not pick up the first local ip address for the source
ip address, return ENETUNREACH instead.

Submitted by:	Gleb Smirnoff
Reviewed by:	-current (silence)
2004-06-16 10:02:36 +00:00
Bruce M Simpson
d420fcda27 Fix build for IPSEC && !INET6
PR:		kern/66125
Submitted by:	Cyrille Lefevre
2004-06-16 09:35:07 +00:00
Bruce M Simpson
49b19bfc47 Reverse a patch which has no effect on -CURRENT and should probably be
applied directly to -STABLE.

Noticed by:	iedowse
Pointy hat to:	bms
2004-06-16 08:50:14 +00:00
Bruce M Simpson
57ab3660ff In ip_forward(), when calculating the MTU in effect for an IPSEC transport
mode tunnel, take the per-route MTU into account, *if* and *only if* it
is non-zero (as found in struct rt_metrics/rt_metrics_lite).

PR:		kern/42727
Obtained from:	NetBSD (ip_input.c rev 1.151)
2004-06-16 08:33:09 +00:00
Bruce M Simpson
e6b0a57025 In ip_forward(), set m->m_pkthdr.len correctly such that the mbuf chain
is sane, and ipsec4_getpolicybyaddr() will therefore complete.

PR:		kern/42727
Obtained from:	KAME (kame/freebsd4/sys/netinet/ip_input.c rev 1.42)
2004-06-16 08:28:54 +00:00
Bruce M Simpson
34e3ccb34b Disconnect a temporarily-connected UDP socket in out-of-mbufs case. This
fixes the problem of UDP sockets getting wedged in a connected state (and
bound to their destination) under heavy load.
Temporary bind/connect should probably be deleted in future
as an optimization, as described in "A Faster UDP" [Partridge/Pink 1993].

Notes:
 - INP_LOCK() is already held in udp_output(). The connection is in effect
   happening at a layer lower than the socket layer, therefore in theory
   socket locking should not be needed.
 - Inlining the in_pcbdisconnect() operation buys us nothing (in the case
   of the current state of the code), as laddr is not part of the
   inpcb hash or the udbinfo hash. Therefore there should be no need
   to rehash after restoring laddr in the error case (this was a
   concern of the original author of the patch).

PR:		kern/41765
Requested by:	gnn
Submitted by:	Jinmei Tatuya (with cleanups)
Tested by:	spray(8)
2004-06-16 05:41:00 +00:00
Robert Watson
a97719a4c5 Convert GIANT_REQUIRED to NET_ASSERT_GIANT for socket access. 2004-06-16 03:36:06 +00:00
Robert Watson
7721f5d760 Grab the socket buffer send or receive mutex when performing a
read-modify-write on the sb_state field.  This commit catches only
the "easy" ones where it doesn't interact with as yet unmerged
locking.
2004-06-15 03:51:44 +00:00
Robert Watson
c0b99ffa02 The socket field so_state is used to hold a variety of socket related
flags relating to several aspects of socket functionality.  This change
breaks out several bits relating to send and receive operation into a
new per-socket buffer field, sb_state, in order to facilitate locking.
This is required because, in order to provide more granular locking of
sockets, different state fields have different locking properties.  The
following fields are moved to sb_state:

  SS_CANTRCVMORE            (so_state)
  SS_CANTSENDMORE           (so_state)
  SS_RCVATMARK              (so_state)

Rename respectively to:

  SBS_CANTRCVMORE           (so_rcv.sb_state)
  SBS_CANTSENDMORE          (so_snd.sb_state)
  SBS_RCVATMARK             (so_rcv.sb_state)

This facilitates locking by isolating fields to be located with other
identically locked fields, and permits greater granularity in socket
locking by avoiding storing fields with different locking semantics in
the same short (avoiding locking conflicts).  In the future, we may
wish to coallesce sb_state and sb_flags; for the time being I leave
them separate and there is no additional memory overhead due to the
packing/alignment of shorts in the socket buffer structure.
2004-06-14 18:16:22 +00:00
Max Laier
02b199f158 Link ALTQ to the build and break with ABI for struct ifnet. Please recompile
your (network) modules as well as any userland that might make sense of
sizeof(struct ifnet).
This does not change the queueing yet. These changes will follow in a
seperate commit. Same with the driver changes, which need case by case
evaluation.

__FreeBSD_version bump will follow.

Tested-by:	(i386)LINT
2004-06-13 17:29:10 +00:00
Doug Rabson
b8b3323469 Add a new driver to support IP over firewire. This driver is intended to
conform to the rfc2734 and rfc3146 standard for IP over firewire and
should eventually supercede the fwe driver. Right now the broadcast
channel number is hardwired and we don't support MCAP for multicast
channel allocation - more infrastructure is required in the firewire
code itself to fix these problems.
2004-06-13 10:54:36 +00:00
Robert Watson
310e7ceb94 Socket MAC labels so_label and so_peerlabel are now protected by
SOCK_LOCK(so):

- Hold socket lock over calls to MAC entry points reading or
  manipulating socket labels.

- Assert socket lock in MAC entry point implementations.

- When externalizing the socket label, first make a thread-local
  copy while holding the socket lock, then release the socket lock
  to externalize to userspace.
2004-06-13 02:50:07 +00:00
Robert Watson
395a08c904 Extend coverage of SOCK_LOCK(so) to include so_count, the socket
reference count:

- Assert SOCK_LOCK(so) macros that directly manipulate so_count:
  soref(), sorele().

- Assert SOCK_LOCK(so) in macros/functions that rely on the state of
  so_count: sofree(), sotryfree().

- Acquire SOCK_LOCK(so) before calling these functions or macros in
  various contexts in the stack, both at the socket and protocol
  layers.

- In some cases, perform soisdisconnected() before sotryfree(), as
  this could result in frobbing of a non-present socket if
  sotryfree() actually frees the socket.

- Note that sofree()/sotryfree() will release the socket lock even if
  they don't free the socket.

Submitted by:	sam
Sponsored by:	FreeBSD Foundation
Obtained from:	BSD/OS
2004-06-12 20:47:32 +00:00
Christian S.J. Peron
d316f2cf4f Modify ip fw so that whenever UID or GID constraints exist in a
ruleset, the pcb is looked up once per ipfw_chk() activation.

This is done by extracting the required information out of the PCB
and caching it to the ipfw_chk() stack. This should greatly reduce
PCB looking contention and speed up the processing of UID/GID based
firewall rules (especially with large UID/GID rulesets).

Some very basic benchmarks were taken which compares the number
of in_pcblookup_hash(9) activations to the number of firewall
rules containing UID/GID based contraints before and after this patch.

The results can be viewed here:
o http://people.freebsd.org/~csjp/ip_fw_pcb.png

Reviewed by:	andre, luigi, rwatson
Approved by:	bmilekic (mentor)
2004-06-11 22:17:14 +00:00
Robert Watson
c1d587c848 Remove unneeded Giant acquisition in divert_packet(), which is
left over from debug.mpsafenet affecting only the forwarding
plane.  Giant is now acquired in the ithread/netisr or in the
system call code.
2004-06-11 04:06:51 +00:00
Robert Watson
c14800e6ff Lock down parallel router_info list for tracking multicast IGMP
versions of various routers seen:

- Introduce igmp_mtx.
- Protect global variable 'router_info_head' and list fields
  in struct router_info with this mutex, as well as
  igmp_timers_are_running.
- find_rti() asserts that the caller acquires igmp_mtx.
- Annotate a failure to check the return value of
  MALLOC(..., M_NOWAIT).
2004-06-11 03:42:37 +00:00
Ruslan Ermilov
dd4d62c7d8 init_tables() must be run after sys/net/route.c:route_init(). 2004-06-10 20:20:37 +00:00
Ruslan Ermilov
cd8b5ae0ae Introduce a new feature to IPFW2: lookup tables. These are useful
for handling large sparse address sets.  Initial implementation by
Vsevolod Lobko <seva@ip.net.ua>, refined by me.

MFC after:	1 week
2004-06-09 20:10:38 +00:00
Hajimu UMEMOTO
cad1917d48 do not send icmp response if the original packet is encrypted.
Obtained from:	KAME
MFC after:	1 week
2004-06-07 09:56:59 +00:00
Bosko Milekic
ac830b58d1 Move the locking of the pcb into raw_output(). Organize code so
that m_prepend() is not called with possibility to wait while the
pcb lock is held.  What still needs revisiting is whether the
ripcbinfo lock is really required here.

Discussed with: rwatson
2004-06-03 03:15:29 +00:00
Poul-Henning Kamp
5dba30f15a add missing #include <sys/module.h> 2004-05-30 20:27:19 +00:00
Poul-Henning Kamp
41ee9f1c69 Add some missing <sys/module.h> includes which are masked by the
one on death-row in <sys/kernel.h>
2004-05-30 17:57:46 +00:00
Christian S.J. Peron
b5ef991561 Add a super-user check to ipfw_ctl() to make sure that the calling
process is a non-prison root. The security.jail.allow_raw_sockets
sysctl variable is disabled by default, however if the user enables
raw sockets in prisons, prison-root should not be able to interact
with firewall rule sets.

Approved by:	rwatson, bmilekic (mentor)
2004-05-25 15:02:12 +00:00
Yaroslav Tykhiy
4658dc8325 When checking for possible port theft, skip over a TCP inpcb
unless it's in the closed or listening state (remote address
== INADDR_ANY).

If a TCP inpcb is in any other state, it's impossible to steal
its local port or use it for port theft.  And if there are
both closed/listening and connected TCP inpcbs on the same
localIP:port couple, the call to in_pcblookup_local() will
find the former due to the design of that function.

No objections raised in:	-net, -arch
MFC after:			1 month
2004-05-20 06:35:02 +00:00
Maxim Konovalov
a49b21371a o Calculate a number of bytes to copy (cnt) correctly:
+----+-+-+-+-+----+----+- - - - - - - - - - - -  -+----+
  |    | |C| | |    |    |                          |    |
  | IP |N|O|L|P|    | IP |                          | IP |
  | #1 |O|D|E|T|    | #2 |                          | #n |
  |    |P|E|N|R|    |    |                          |    |
  +----+-+-+-+-+----+----+- - - - - - - - - - - -  -+----+
               ^    ^<---- cnt - (IPOPT_MINOFF - 1) ---->|
               |    |
src            |    +-- cp[IPOPT_OFF + 1] + sizeof(struct in_addr)
               |
dst            +-- cp[IPOPT_OFF + 1]

PR:		kern/66386
Submitted by:	Andrei Iltchenko
MFC after:	3 weeks
2004-05-11 19:14:44 +00:00
Maxim Konovalov
d0946241ac o IFNAMSIZ does include the trailing \0.
Approved by:	andre

o Document net.inet.icmp.reply_src.
2004-05-07 01:24:53 +00:00
Andre Oppermann
2bde81acd6 Provide the sysctl net.inet.ip.process_options to control the processing
of IP options.

 net.inet.ip.process_options=0  Ignore IP options and pass packets unmodified.
 net.inet.ip.process_options=1  Process all IP options (default).
 net.inet.ip.process_options=2  Reject all packets with IP options with ICMP
  filter prohibited message.

This sysctl affects packets destined for the local host as well as those
only transiting through the host (routing).

IP options do not have any legitimate purpose anymore and are only used
to circumvent firewalls or to exploit certain behaviours or bugs in TCP/IP
stacks.

Reviewed by:	sam (mentor)
2004-05-06 18:46:03 +00:00
Robert Watson
c18b97c630 Switch to using the inpcb MAC label instead of socket MAC label when
labeling new mbufs created from sockets/inpcbs in IPv4.  This helps avoid
the need for socket layer locking in the lower level network paths
where inpcb locks are already frequently held where needed.  In
particular:

- Use the inpcb for label instead of socket in raw_append().
- Use the inpcb for label instead of socket in tcp_output().
- Use the inpcb for label instead of socket in tcp_respond().
- Use the inpcb for label instead of socket in tcp_twrespond().
- Use the inpcb for label instead of socket in syncache_respond().

While here, modify tcp_respond() to avoid assigning NULL to a stack
variable and centralize assertions about the inpcb when inp is
assigned.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, McAfee Research
2004-05-04 02:11:47 +00:00
Robert Watson
87f2bb8caf Assert inpcb lock in udp_append().
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, McAfee Research
2004-05-04 01:08:15 +00:00
Robert Watson
cbe42d48bd Assert the inpcb lock on 'last' in udp_append(), since it's always
called with it, and also requires it.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, McAfee Research
2004-05-04 00:10:16 +00:00
Maxim Konovalov
1a0c4873ed o Fix misindentation in the previous commit. 2004-05-03 17:15:34 +00:00
Andre Oppermann
7652802b06 Back out a change that slipped into the previous commit for which other
supporting parts have not yet been committed.

Remove pre-mature IP options ignoring option.
2004-05-03 16:07:13 +00:00
Andre Oppermann
06bb56f43c Optimize IP fastforwarding some more:
o New function ip_findroute() to reduce code duplication for the
  route lookup cases. (luigi)

o Store ip_len in host byte order on the stack instead of using
  it via indirection from the mbuf.  This allows to defer the host
  byte conversion to a later point and makes a quicker fallback to
  normal ip_input() processing. (luigi)

o Check if route is dampned with RTF_REJECT flag and drop packet
  already here when ARP is unable to resolve destination address.
  An ICMP unreachable is sent to inform the sender.

o Check if interface output queue is full and drop packet already
  here.  No ICMP notification is sent because signalling source quench
  is depreciated.

o Check if media_state is down (used for ethernet type interfaces)
  and drop the packet already here.  An ICMP unreachable is sent to
  inform the sender.

o Do not account sent packets to the interface address counters.  They
  are only for packets with that 'ia' as source address.

o Update and clarify some comments.

Submitted by:	luigi (most of it)
2004-05-03 13:52:47 +00:00
Darren Reed
2f3f1e6773 Rename m_claim_next_hop() to m_claim_next(), as suggested by Max Laier. 2004-05-02 15:10:17 +00:00
Darren Reed
7fbb130049 oops, I forgot this file in a prior commit (change was still sitting here,
uncommitted):

Rename ip_claim_next_hop() to m_claim_next_hop(), give it an extra arg
(the type of tag to claim) and push it out of ip_var.h into mbuf.h
alongside all of the other macros that work ok mbuf's and tag's.
2004-05-02 15:07:37 +00:00
Darren Reed
ab884d993e Rename ip_claim_next_hop() to m_claim_next_hop(), give it an extra arg
(the type of tag to claim) and push it out of ip_var.h into mbuf.h alongside
all of the other macros that work ok mbuf's and tag's.
2004-05-02 06:36:30 +00:00
Bosko Milekic
5a59cefcd1 Give jail(8) the feature to allow raw sockets from within a
jail, which is less restrictive but allows for more flexible
jail usage (for those who are willing to make the sacrifice).
The default is off, but allowing raw sockets within jails can
now be accomplished by tuning security.jail.allow_raw_sockets
to 1.

Turning this on will allow you to use things like ping(8)
or traceroute(8) from within a jail.

The patch being committed is not identical to the patch
in the PR.  The committed version is more friendly to
APIs which pjd is working on, so it should integrate
into his work quite nicely.  This change has also been
presented and addressed on the freebsd-hackers mailing
list.

Submitted by: Christian S.J. Peron <maneo@bsdpro.com>
PR: kern/65800
2004-04-26 19:46:52 +00:00
Mike Silbersack
80dd2a81fb Tighten up reset handling in order to make reset attacks as difficult as
possible while maintaining compatibility with the widest range of TCP stacks.

The algorithm is as follows:

---
For connections in the ESTABLISHED state, only resets with
sequence numbers exactly matching last_ack_sent will cause a reset,
all other segments will be silently dropped.

For connections in all other states, a reset anywhere in the window
will cause the connection to be reset.  All other segments will be
silently dropped.
---

The necessity of accepting all in-window resets was discovered
by jayanth and jlemon, both of whom have seen TCP stacks that
will respond to FIN-ACK packets with resets not meeting the
strict last_ack_sent check.

Idea by:        Darren Reed
Reviewed by:    truckman, jlemon, others(?)
2004-04-26 02:56:31 +00:00
Luigi Rizzo
b2a8ac7ca5 Another small set of changes to reduce diffs with the new arp code. 2004-04-25 15:00:17 +00:00
Luigi Rizzo
491522eade remove a stale comment on the behaviour of arpresolve 2004-04-25 14:06:23 +00:00
Luigi Rizzo
cfff63f1b8 Start the arp timer at init time.
It runs so rarely that it makes no sense to wait until the first request.
2004-04-25 12:50:14 +00:00
Luigi Rizzo
cd46a114fc This commit does two things:
1. rt_check() cleanup:
    rt_check() is only necessary for some address families to gain access
    to the corresponding arp entry, so call it only in/near the *resolve()
    routines where it is actually used -- at the moment this is
    arpresolve(), nd6_storelladdr() (the call is embedded here),
    and atmresolve() (the call is just before atmresolve to reduce
    the number of changes).
    This change will make it a lot easier to decouple the arp table
    from the routing table.

    There is an extra call to rt_check() in if_iso88025subr.c to
    determine the routing info length. I have left it alone for
    the time being.

    The interface of arpresolve() and nd6_storelladdr() now changes slightly:
     + the 'rtentry' parameter (really a hint from the upper level layer)
       is now passed unchanged from *_output(), so it becomes the route
       to the final destination and not to the gateway.
     + the routines will return 0 if resolution is possible, non-zero
       otherwise.
     + arpresolve() returns EWOULDBLOCK in case the mbuf is being held
       waiting for an arp reply -- in this case the error code is masked
       in the caller so the upper layer protocol will not see a failure.

2. arpcom untangling
    Where possible, use 'struct ifnet' instead of 'struct arpcom' variables,
    and use the IFP2AC macro to access arpcom fields.
    This mostly affects the netatalk code.

=== Detailed changes: ===
net/if_arcsubr.c
   rt_check() cleanup, remove a useless variable

net/if_atmsubr.c
   rt_check() cleanup

net/if_ethersubr.c
   rt_check() cleanup, arpcom untangling

net/if_fddisubr.c
   rt_check() cleanup, arpcom untangling

net/if_iso88025subr.c
   rt_check() cleanup

netatalk/aarp.c
   arpcom untangling, remove a block of duplicated code

netatalk/at_extern.h
   arpcom untangling

netinet/if_ether.c
   rt_check() cleanup (change arpresolve)

netinet6/nd6.c
   rt_check() cleanup (change nd6_storelladdr)
2004-04-25 09:24:52 +00:00
Mike Silbersack
6b2fc10b64 Wrap two long lines in the previous commit. 2004-04-23 23:29:49 +00:00
Andre Oppermann
2d166c0202 Correct an edge case in tcp_mss() where the cached path MTU
from tcp_hostcache would have overridden a (now) lower MTU of
an interface or route that changed since first PMTU discovery.
The bug would have caused TCP to redo the PMTU discovery when
not strictly necessary.

Make a comment about already pre-initialized default values
more clear.

Reviewed by:	sam
2004-04-23 22:44:59 +00:00
Andre Oppermann
22b5770b99 Add the option versrcreach to verify that a valid route to the
source address of a packet exists in the routing table.  The
default route is ignored because it would match everything and
render the check pointless.

This option is very useful for routers with a complete view of
the Internet (BGP) in the routing table to reject packets with
spoofed or unrouteable source addresses.

Example:

 ipfw add 1000 deny ip from any to any not versrcreach

also known in Cisco-speak as:

  ip verify unicast source reachable-via any

Reviewed by:	luigi
2004-04-23 14:28:38 +00:00
Andre Oppermann
b62dccc7e5 Fix a potential race when purging expired hostcache entries.
Spotted by:	luigi
2004-04-23 13:54:28 +00:00
Mike Silbersack
174624e01d Take out an unneeded variable I forgot to remove in the last commit,
and make two small whitespace fixes so that diffs vs rev 1.142 are minimal.
2004-04-22 08:34:55 +00:00
Mike Silbersack
6ac48b7409 Simplify random port allocation, and add net.inet.ip.portrange.randomized,
which can be used to turn off randomized port allocation if so desired.

Requested by:	alfred
2004-04-22 08:32:14 +00:00
Bruce M Simpson
de9f59f850 Fix a typo in a comment. 2004-04-20 19:04:24 +00:00
Mike Silbersack
6dd946b3f7 Switch from using sequential to random ephemeral port allocation,
implementation taken directly from OpenBSD.

I've resisted committing this for quite some time because of concern over
TIME_WAIT recycling breakage (sequential allocation ensures that there is a
long time before ports are recycled), but recent testing has shown me that
my fears were unwarranted.
2004-04-20 06:45:10 +00:00
Mike Silbersack
c1537ef063 Enhance our RFC1948 implementation to perform better in some pathlogical
TIME_WAIT recycling cases I was able to generate with http testing tools.

In short, as the old algorithm relied on ticks to create the time offset
component of an ISN, two connections with the exact same host, port pair
that were generated between timer ticks would have the exact same sequence
number.  As a result, the second connection would fail to pass the TIME_WAIT
check on the server side, and the SYN would never be acknowledged.

I've "fixed" this by adding random positive increments to the time component
between clock ticks so that ISNs will *always* be increasing, no matter how
quickly the port is recycled.

Except in such contrived benchmarking situations, this problem should never
come up in normal usage...  until networks get faster.

No MFC planned, 4.x is missing other optimizations that are needed to even
create the situation in which such quick port recycling will occur.
2004-04-20 06:33:39 +00:00
Luigi Rizzo
ac912b2dc8 Replace Bcopy with 'the real thing' as in the rest of the file. 2004-04-18 11:45:49 +00:00
Luigi Rizzo
e6e51f0518 In an effort to simplify the routing code, try to deprecate rtalloc()
in favour of rtalloc_ign(), which is what would end up being called
anyways.

There are 25 more instances of rtalloc() in net*/ and
about 10 instances of rtalloc_ign()
2004-04-14 01:13:14 +00:00
Warner Losh
f36cfd49ad Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson
2004-04-07 20:46:16 +00:00
Ruslan Ermilov
390cdc6a76 Fixed a bug in previous revision: compute the payload checksum before
we convert ip_len into a network byte order; in_delayed_cksum() still
expects it in host byte order.

The symtom was the ``in_cksum_skip: out of data by %d'' complaints
from the kernel.

To add to the previous commit log.  These fixes make tcpdump(1) happy
by not complaining about UDP/TCP checksum being bad for looped back
IP multicast when multicast router is deactivated.

Reported by:	Vsevolod Lobko
2004-04-07 10:01:39 +00:00
Bruce Evans
30a4ab088a Fixed misspelling of IPPORT_MAX as USHRT_MAX. Don't include <sys/limits.h>
to implement this mistake.

Fixed some nearby style bugs (initialization in declaration, misformatting
of this initialization, missing blank line after the declaration, and
comparision of the non-boolean result of the initialization with 0 using
"!".  In KNF, "!" is not even used to compare booleans with 0).
2004-04-06 10:59:11 +00:00
Robert Watson
47f32f6fa6 Two missed in previous commit -- compare pointer with NULL rather than
using it as a boolean.
2004-04-05 00:52:05 +00:00
Robert Watson
24459934e9 Prefer NULL to 0 when checking pointer values as integers or booleans. 2004-04-05 00:49:07 +00:00
Pawel Jakub Dawidek
52710de1cb Fix a panic possibility caused by returning without releasing locks.
It was fixed by moving problemetic checks, as well as checks that
doesn't need locking before locks are acquired.

Submitted by:		Ryan Sommers <ryans@gamersimpact.com>
In co-operation with:	cperciva, maxim, mlaier, sam
Tested by:		submitter (previous patch), me (current patch)
Reviewed by:		cperciva, mlaier (previous patch), sam (current patch)
Approved by:		sam
Dedicated to:		enough!
2004-04-04 20:14:55 +00:00
Luigi Rizzo
f7c5baa1c6 + arpresolve(): remove an unused argument
+ struct ifnet: remove unused fields, move ipv6-related field close
  to each other, add a pointer to l3<->l2 translation tables (arp,nd6,
  etc.) for future use.

+ struct route: remove an unused field, move close to each
  other some fields that might likely go away in the future
2004-04-04 06:14:55 +00:00
Daniel Eischen
ab39bc9a92 Unbreak natd.
Reported and submitted by:	Sean McNeil (sean at mcneil.com)
2004-04-02 17:57:57 +00:00
Dag-Erling Smørgrav
e271f829b8 Raise WARNS level to 2. 2004-03-31 21:33:55 +00:00
Dag-Erling Smørgrav
2871c50186 Deal with aliasing warnings.
Reviewed by:	ru
Approved by:	silence on the lists
2004-03-31 21:32:58 +00:00
Robert Watson
7101d752b2 Invert the logic of NET_LOCK_GIANT(), and remove the one reference to it.
Previously, Giant would be grabbed at entry to the IP local delivery code
when debug.mpsafenet was set to true, as that implied Giant wouldn't be
grabbed in the driver path.  Now, we will use this primitive to
conditionally grab Giant in the event the entire network stack isn't
running MPSAFE (debug.mpsafenet == 0).
2004-03-28 23:12:19 +00:00
Pawel Jakub Dawidek
56dc72c3b6 Remove unused argument. 2004-03-28 15:48:00 +00:00
Pawel Jakub Dawidek
b0330ed929 Reduce 'td' argument to 'cred' (struct ucred) argument in those functions:
- in_pcbbind(),
	- in_pcbbind_setup(),
	- in_pcbconnect(),
	- in_pcbconnect_setup(),
	- in6_pcbbind(),
	- in6_pcbconnect(),
	- in6_pcbsetport().
"It should simplify/clarify things a great deal." --rwatson

Requested by:	rwatson
Reviewed by:	rwatson, ume
2004-03-27 21:05:46 +00:00
Pawel Jakub Dawidek
6823b82399 Remove unused argument.
Reviewed by:	ume
2004-03-27 20:41:32 +00:00
Hajimu UMEMOTO
a5d1aae31a Validate IPv6 socket options more carefully to avoid a panic.
PR:		kern/61513
Reviewed by:	cperciva, nectar
2004-03-26 19:52:18 +00:00
Pawel Jakub Dawidek
8da601dfb7 Remove unused function.
It was used in FreeBSD 4.x, but now we're using cr_canseesocket().
2004-03-25 15:12:12 +00:00
Ruslan Ermilov
26f16ebeb1 Untangle IP multicast routing interaction with delayed payload checksums.
Compute the payload checksum for a locally originated IP multicast where
God intended, in ip_mloopback(), rather than doing it in ip_output() and
only when multicast router is active.  This is more correct as we do not
fool ip_input() that the packet has the correct payload checksum when in
fact it does not (when multicast router is inactive).  This is also more
efficient if we don't join the multicast group we send to, thus allowing
the hardware to checksum the payload.
2004-03-25 08:46:27 +00:00
Robert Watson
bdae44a844 Lock down global variables in if_gre:
- Add gre_mtx to protect global softc list.
- Hold gre_mtx over various list operations (insert, delete).
- Centralize if_gre interface teardown in gre_destroy(), and call this
  from modevent unload and gre_clone_destroy().
- Export gre_mtx to ip_gre.c, which walks the gre list to look up gre
  interfaces during encapsulation.  Add a wonking comment on how we need
  some sort of drain/reference count mechanism to keep gre references
  alive while in use and simultaneous destroy.

This commit does not lockdown softc data, which follows in a future
commit.
2004-03-22 16:04:43 +00:00
Matthew N. Dodd
2964fb6538 - Fix indentation lost by 'diff -b'.
- Un-wrap short line.
2004-03-21 18:51:26 +00:00
Matthew N. Dodd
64bf80ce1b Remove interface type specific code from arprequest(), and in_arpinput().
The AF_ARP case in the (*if_output)() routine will handle the interface type
specific bits.

Obtained from:	NetBSD
2004-03-21 06:36:05 +00:00
Dag-Erling Smørgrav
f0f93429cf Run through indent(1) so I can read the code without getting a headache.
The result isn't quite knf, but it's knfer than the original, and far
more consistent.
2004-03-16 21:30:41 +00:00
Matthew N. Dodd
e952fa39de De-register. 2004-03-14 00:44:11 +00:00
Robert Watson
fe5a02c927 Lock down IP-layer encapsulation library:
- Add encapmtx to protect ip_encap.c global variables (encapsulation
   list).
 - Unifdef #ifdef 0 pieces of encap_init() which was (and now really
   is) basically a no-op.
 - Lock encapmtx when walking encaptab, modifying it, comparing
   entries, etc.
 - Remove spl's.

Note that currently there's no facilite to make sure outstanding
use of encapsulation methods on a table entry have drained bfore
we allow a table entry to be removed.  As such, it's currently the
caller's responsibility to make sure that draining takes place.

Reviewed by:	mlaier
2004-03-10 02:48:50 +00:00
Robert Watson
846840ba95 Scrub unused variable zeroin_addr. 2004-03-10 01:01:04 +00:00
Jeffrey Hsu
a062038267 To comply with the spec, do not copy the TOS from the outer IP
header to the inner IP header of the PIM Register if this is a PIM
Null-Register message.

Submitted by:	Pavlin Radoslavov <pavlin@icir.org>
2004-03-08 07:47:27 +00:00
Jeffrey Hsu
4c9792f9d3 Include <sys/types.h> for autoconf/automake detection.
Submitted by:	Pavlin Radoslavov <pavlin@icir.org>
2004-03-08 07:45:32 +00:00
Max Laier
b81dae751b Add some missing DUMMYNET_UNLOCK() in config_pipe().
Noticed by:	Simon Coggins
Approved by:	bms(mentor)
2004-03-03 01:33:22 +00:00
Max Laier
4672d81921 Two minor follow-ups on the MT_TAG removal:
ifp is now passed explicitly to ether_demux; no need to look it up again.
Make mtag a global var in ip_input.

Noticed by:	rwatson
Approved by:	bms(mentor)
2004-03-02 14:37:23 +00:00
Robert Watson
6200a93f82 Rename NET_PICKUP_GIANT() to NET_LOCK_GIANT(), and NET_DROP_GIANT()
to NET_UNLOCK_GIANT().  While they are used in similar ways, the
semantics are quite different -- NET_LOCK_GIANT() and NET_UNLOCK_GIANT()
directly wrap mutex lock and unlock operations, whereas drop/pickup
special case the handling of Giant recursion.  Add a comment saying
as much.

Add NET_ASSERT_GIANT(), which conditionally asserts Giant based
on the value of debug_mpsafenet.
2004-03-01 22:37:01 +00:00
Hajimu UMEMOTO
04d3a45241 fix -O0 compilation without INET6.
Pointed out by:	ru
2004-03-01 19:10:31 +00:00
Robert Watson
768bbd68cc Remove unneeded {} originally used to hold local variables for dummynet
in a code block, as the variable is now gone.

Submitted by:	sam
2004-02-28 19:50:43 +00:00
Robert Watson
a7b6a14aee Remove now unneeded arguments to tcp_twrespond() -- so and msrc. These
were needed by the MAC Framework until inpcbs gained labels.

Submitted by:	sam
2004-02-28 15:12:20 +00:00
Max Laier
25a4adcec4 Bring eventhandler callbacks for pf.
This enables pf to track dynamic address changes on interfaces (dailup) with
the "on (<ifname>)"-syntax. This also brings hooks in anticipation of
tracking cloned interfaces, which will be in future versions of pf.

Approved by: bms(mentor)
2004-02-26 04:27:55 +00:00
Max Laier
cc5934f5af Tweak existing header and other build infrastructure to be able to build
pf/pflog/pfsync as modules. Do not list them in NOTES or modules/Makefile
(i.e. do not connect it to any (automatic) builds - yet).

Approved by: bms(mentor)
2004-02-26 03:53:54 +00:00
Don Lewis
47934cef8f Split the mlock() kernel code into two parts, mlock(), which unpacks
the syscall arguments and does the suser() permission check, and
kern_mlock(), which does the resource limit checking and calls
vm_map_wire().  Split munlock() in a similar way.

Enable the RLIMIT_MEMLOCK checking code in kern_mlock().

Replace calls to vslock() and vsunlock() in the sysctl code with
calls to kern_mlock() and kern_munlock() so that the sysctl code
will obey the wired memory limits.

Nuke the vslock() and vsunlock() implementations, which are no
longer used.

Add a member to struct sysctl_req to track the amount of memory
that is wired to handle the request.

Modify sysctl_wire_old_buffer() to return an error if its call to
kern_mlock() fails.  Only wire the minimum of the length specified
in the sysctl request and the length specified in its argument list.
It is recommended that sysctl handlers that use sysctl_wire_old_buffer()
should specify reasonable estimates for the amount of data they
want to return so that only the minimum amount of memory is wired
no matter what length has been specified by the request.

Modify the callers of sysctl_wire_old_buffer() to look for the
error return.

Modify sysctl_old_user to obey the wired buffer length and clean up
its implementation.

Reviewed by:	bms
2004-02-26 00:27:04 +00:00
Max Laier
ac9d7e2618 Re-remove MT_TAGs. The problems with dummynet have been fixed now.
Tested by: -current, bms(mentor), me
Approved by: bms(mentor), sam
2004-02-25 19:55:29 +00:00
Bruce Evans
0613995bd0 Fixed namespace pollution in rev.1.74. Implementation of the syncache
increased <netinet/tcp_var>'s already large set of prerequisites, and
this was handled badly.  Just don't declare the complete syncache struct
unless <netinet/pcb.h> is included before <netinet/tcp_var.h>.

Approved by:	jlemon (years ago, for a more invasive fix)
2004-02-25 13:03:01 +00:00
Bruce Evans
a545b1dc4d Don't use the negatively-opaque type uma_zone_t or be chummy with
<vm/uma.h>'s idempotency indentifier or its misspelling.
2004-02-25 11:53:19 +00:00
Jeffrey Hsu
89c02376fc Relax a KASSERT condition to allow for a valid corner case where
the FIN on the last segment consumes an extra sequence number.

Spurious panic reported by Mike Silbersack <silby@silby.com>.
2004-02-25 08:53:17 +00:00
Andre Oppermann
12e2e97051 Convert the tcp segment reassembly queue to UMA and limit the maximum
amount of segments it will hold.

The following tuneables and sysctls control the behaviour of the tcp
segment reassembly queue:

 net.inet.tcp.reass.maxsegments (loader tuneable)
  specifies the maximum number of segments all tcp reassemly queues can
  hold (defaults to 1/16 of nmbclusters).

 net.inet.tcp.reass.maxqlen
  specifies the maximum number of segments any individual tcp session queue
  can hold (defaults to 48).

 net.inet.tcp.reass.cursegments (readonly)
  counts the number of segments currently in all reassembly queues.

 net.inet.tcp.reass.overflows (readonly)
  counts how often either the global or local queue limit has been reached.

Tested by:	bms, silby
Reviewed by:	bms, silby
2004-02-24 15:27:41 +00:00
Pawel Jakub Dawidek
41fe0c8ad5 Fixed ucred structure leak.
Approved by:	scottl (mentor)
PR:		54163
MFC after:	3 days
2004-02-19 14:13:21 +00:00
Max Laier
36e8826ffb Backout MT_TAG removal (i.e. bring back MT_TAGs) for now, as dummynet is
not working properly with the patch in place.

Approved by: bms(mentor)
2004-02-18 00:04:52 +00:00
Hajimu UMEMOTO
da0f40995d IPSEC and FAST_IPSEC have the same internal API now;
so merge these (IPSEC has an extra ipsecstat)

Submitted by:	"Bjoern A. Zeeb" <bzeeb+freebsd@zabbadoz.net>
2004-02-17 14:02:37 +00:00
Bruce M Simpson
88f6b0435e Shorten the name of the socket option used to enable TCP-MD5 packet
treatment.

Submitted by:	Vincent Jardin
2004-02-16 22:21:16 +00:00
Hajimu UMEMOTO
70dbc6cbfc don't update outgoing ifp, if ipsec tunnel mode encapsulation
was not made.

Obtained from:	KAME
2004-02-16 17:05:06 +00:00
Bruce M Simpson
91179f796d Spell types consistently throughout this file. Do not use the __packed attribute, as we are often #include'd from userland without <sys/cdefs.h> in front of us, and it is not strictly necessary.
Noticed by:	Sascha Blank
2004-02-16 14:40:56 +00:00
Bruce M Simpson
32ff046639 Final brucification pass. Spell types consistently (u_int). Remove bogus
casts. Remove unnecessary parenthesis.

Submitted by:	bde
2004-02-14 21:49:48 +00:00
Max Laier
97075d0c0a Do not expose ip_dn_find_rule inline function to userland and unbreak world.
----------------------------------------------------------------------
2004-02-13 22:26:36 +00:00
Max Laier
189a0ba4e7 Do not check receive interface when pfil(9) hook changed address.
Approved by: bms(mentor)
2004-02-13 19:20:43 +00:00
Max Laier
1094bdca51 This set of changes eliminates the use of MT_TAG "pseudo mbufs", replacing
them mostly with packet tags (one case is handled by using an mbuf flag
since the linkage between "caller" and "callee" is direct and there's no
need to incur the overhead of a packet tag).

This is (mostly) work from: sam

Silence from: -arch
Approved by: bms(mentor), sam, rwatson
2004-02-13 19:14:16 +00:00
Bruce M Simpson
265ed01285 Brucification.
Submitted by:	bde
2004-02-13 18:21:45 +00:00
Hajimu UMEMOTO
efddf5c64d supported IPV6_RECVPATHMTU socket option.
Obtained from:	KAME
2004-02-13 14:50:01 +00:00
Bruce M Simpson
b30190b542 Update the prototype for tcpsignature_apply() to reflect the spelling of
the types used by m_apply()'s callback function, f, as documented in mbuf(9).

Noticed by:	njl
2004-02-12 20:16:09 +00:00
Bruce M Simpson
bca0e5bfc3 style(9) pass; whitespace and comments.
Submitted by:	njl
2004-02-12 20:12:48 +00:00
Bruce M Simpson
a0194ef1ea Remove an unnecessary initialization that crept in from the code which
verifies TCP-MD5 digests.

Noticed by:	njl
2004-02-12 20:08:28 +00:00
Bruce M Simpson
45d370ee8b Fix a typo; left out preprocessor conditional for sigoff variable, which
is only used by TCP_SIGNATURE code.

Noticed by:	Roop Nanuwa
2004-02-11 09:46:54 +00:00
Bruce M Simpson
1cfd4b5326 Initial import of RFC 2385 (TCP-MD5) digest support.
This is the first of two commits; bringing in the kernel support first.
This can be enabled by compiling a kernel with options TCP_SIGNATURE
and FAST_IPSEC.

For the uninitiated, this is a TCP option which provides for a means of
authenticating TCP sessions which came into being before IPSEC. It is
still relevant today, however, as it is used by many commercial router
vendors, particularly with BGP, and as such has become a requirement for
interconnect at many major Internet points of presence.

Several parts of the TCP and IP headers, including the segment payload,
are digested with MD5, including a shared secret. The PF_KEY interface
is used to manage the secrets using security associations in the SADB.

There is a limitation here in that as there is no way to map a TCP flow
per-port back to an SPI without polluting tcpcb or using the SPD; the
code to do the latter is unstable at this time. Therefore this code only
supports per-host keying granularity.

Whilst FAST_IPSEC is mutually exclusive with KAME IPSEC (and thus IPv6),
TCP_SIGNATURE applies only to IPv4. For the vast majority of prospective
users of this feature, this will not pose any problem.

This implementation is output-only; that is, the option is honoured when
responding to a host initiating a TCP session, but no effort is made
[yet] to authenticate inbound traffic. This is, however, sufficient to
interwork with Cisco equipment.

Tested with a Cisco 2501 running IOS 12.0(27), and Quagga 0.96.4 with
local patches. Patches for tcpdump to validate TCP-MD5 sessions are also
available from me upon request.

Sponsored by:	sentex.net
2004-02-11 04:26:04 +00:00
Hajimu UMEMOTO
f073c60f73 pass pcb rather than so. it is expected that per socket policy
works again.
2004-02-03 18:20:55 +00:00
Andre Oppermann
b74d89bbbb Add sysctl net.inet.icmp.reply_src to specify the interface name
used for the ICMP reply source in reponse to packets which are not
directly addressed to us.  By default continue with with normal
source selection.

Reviewed by:	bms
2004-02-02 22:53:16 +00:00
Andre Oppermann
1488eac8ec More verbose description of the source ip address selection for ICMP replies.
Reviewed by:	bms
2004-02-02 22:17:09 +00:00
Poul-Henning Kamp
be8a62e821 Introduce the SO_BINTIME option which takes a high-resolution timestamp
at packet arrival.

For benchmarking purposes SO_BINTIME is preferable to SO_TIMEVAL
since it has higher resolution and lower overhead.  Simultaneous
use of the two options is possible and they will return consistent
timestamps.

This introduces an extra test and a function call for SO_TIMEVAL, but I have
not been able to measure that.
2004-01-31 10:40:25 +00:00
Maxim Sobolev
4c83789253 Remove NetBSD'isms (add FreeBSD'isms?), which makes gre(4) working again. 2004-01-30 09:03:01 +00:00
Ruslan Ermilov
0ca2861fc9 Correct the descriptions of the net.inet.{udp,raw}.recvspace sysctls. 2004-01-27 22:17:39 +00:00
Maxim Sobolev
7735aeb9bb Add support for WCCPv2. It should be enablem manually using link2
ifconfig(8) flag since header for version 2 is the same but IP payload
is prepended with additional 4-bytes field.

Inspired by:	Roman Synyuk <roman@univ.kiev.ua>
MFC after:	2 weeks
2004-01-26 12:33:56 +00:00
Maxim Sobolev
6e628b8187 (whilespace-only)
Kill trailing spaces.
2004-01-26 12:21:59 +00:00
Andre Oppermann
241f1e33b1 Remove leftover FREE() from changes in rev 1.50.
Noticed by:	Jun Kuriyama <kuriyama@imgsrc.co.jp>
2004-01-23 01:39:12 +00:00
Andre Oppermann
201d185b69 Split the overloaded variable 'win' into two for their specific purposes:
recwin and sendwin.  This removes a big source of confusion and makes
following the code much easier.

Reviewed by:	sam (mentor)
Obtained from:	DragonFlyBSD rev 1.6 (hsu)
2004-01-22 23:22:14 +00:00
Andre Oppermann
1ddba8d63e Move the reduction by one of the syncache limit after the zone has been
allocated.

Reviewed by:    sam (mentor)
Obtained from:  DragonFlyBSD rev 1.6 (hsu)
2004-01-22 23:14:48 +00:00
Andre Oppermann
73080de2be Remove an unused variable and put the sockaddr_in6 onto the stack instead
of malloc'ing it.

Reviewed by:	sam (mentor)
Obtained from:	DragonFlyBSD rev 1.6 (hsu)
2004-01-22 23:10:11 +00:00
Jeffrey Hsu
61a36e3dfc Merge from DragonFlyBSD rev 1.10:
date: 2003/09/02 10:04:47;  author: hsu;  state: Exp;  lines: +5 -6
Account for when Limited Transmit is not congestion window limited.

Obtained from:	DragonFlyBSD
2004-01-20 21:40:25 +00:00
Poul-Henning Kamp
5e289f9eb6 Mostly mechanical rework of libalias:
Makes it possible to have multiple packet aliasing instances in a
single process by moving all static and global variables into an
instance structure called "struct libalias".

Redefine a new API based on s/PacketAlias/LibAlias/g

Add new "instance" argument to all functions in the new API.

Implement old API in terms of the new API.
2004-01-17 10:52:21 +00:00
Hajimu UMEMOTO
548c676b32 do not deref freed pointer
Submitted by:	"Bjoern A. Zeeb" <bzeeb+freebsd@zabbadoz.net>
Reviewed by:	itojun
2004-01-13 09:51:47 +00:00
Andre Oppermann
bed824fa90 Disable the minmssoverload connection drop by default until the detection
logic is refined.
2004-01-12 15:46:04 +00:00
Don Lewis
e29ef13f6c Check that sa_len is the appropriate value in tcp_usr_bind(),
tcp6_usr_bind(), tcp_usr_connect(), and tcp6_usr_connect() before checking
to see whether the address is multicast so that the proper errno value
will be returned if sa_len is incorrect.  The checks are identical to the
ones in in_pcbbind_setup(), in6_pcbbind(), and in6_pcbladdr(), which are
called after the multicast address check passes.

MFC after:	30 days
2004-01-10 08:53:00 +00:00
Andre Oppermann
1ddc17c1d5 Reduce TCP_MINMSS default to 216. The AX.25 protocol (packet radio)
is frequently used with an MTU of 256 because of slow speeds and a
high packet loss rate.
2004-01-09 14:14:10 +00:00
Andre Oppermann
53369ac9bb Limiters and sanity checks for TCP MSS (maximum segement size)
resource exhaustion attacks.

For network link optimization TCP can adjust its MSS and thus
packet size according to the observed path MTU.  This is done
dynamically based on feedback from the remote host and network
components along the packet path.  This information can be
abused to pretend an extremely low path MTU.

The resource exhaustion works in two ways:

 o during tcp connection setup the advertized local MSS is
   exchanged between the endpoints.  The remote endpoint can
   set this arbitrarily low (except for a minimum MTU of 64
   octets enforced in the BSD code).  When the local host is
   sending data it is forced to send many small IP packets
   instead of a large one.

   For example instead of the normal TCP payload size of 1448
   it forces TCP payload size of 12 (MTU 64) and thus we have
   a 120 times increase in workload and packets. On fast links
   this quickly saturates the local CPU and may also hit pps
   processing limites of network components along the path.

   This type of attack is particularly effective for servers
   where the attacker can download large files (WWW and FTP).

   We mitigate it by enforcing a minimum MTU settable by sysctl
   net.inet.tcp.minmss defaulting to 256 octets.

 o the local host is reveiving data on a TCP connection from
   the remote host.  The local host has no control over the
   packet size the remote host is sending.  The remote host
   may chose to do what is described in the first attack and
   send the data in packets with an TCP payload of at least
   one byte.  For each packet the tcp_input() function will
   be entered, the packet is processed and a sowakeup() is
   signalled to the connected process.

   For example an attack with 2 Mbit/s gives 4716 packets per
   second and the same amount of sowakeup()s to the process
   (and context switches).

   This type of attack is particularly effective for servers
   where the attacker can upload large amounts of data.
   Normally this is the case with WWW server where large POSTs
   can be made.

   We mitigate this by calculating the average MSS payload per
   second.  If it goes below 'net.inet.tcp.minmss' and the pps
   rate is above 'net.inet.tcp.minmssoverload' defaulting to
   1000 this particular TCP connection is resetted and dropped.

MITRE CVE:	CAN-2004-0002
Reviewed by:	sam (mentor)
MFC after:	1 day
2004-01-08 17:40:07 +00:00
Andre Oppermann
bf87c82ebb If path mtu discovery is enabled set the DF bit in all cases we
send packets on a tcp connection.

PR:		kern/60889
Tested by:	Richard Wendland <richard@wendland.org.uk>
Approved by:	re (scottl)
2004-01-08 11:17:11 +00:00
Andre Oppermann
e0f630ea7a Do not set the ip_id to zero when DF is set on packet and
restore the general pre-randomid behaviour.

Setting the ip_id to zero causes several problems with
packet reassembly when a device along the path removes
the DF bit for some reason.

Other BSD and Linux have found and fixed the same issues.

PR:		kern/60889
Tested by:	Richard Wendland <richard@wendland.org.uk>
Approved by:	re (scottl)
2004-01-08 11:13:40 +00:00
Andre Oppermann
dba7bc6a65 Enable the following TCP options by default to give it more exposure:
rfc3042  Limited retransmit
 rfc3390  Increasing TCP's initial congestion Window
 inflight TCP inflight bandwidth limiting

All my production server have it enabled and there have been no
issues.  I am confident about having them on by default and it gives
us better overall TCP performance.

Reviewed by:	sam (mentor)
2004-01-06 23:29:46 +00:00
Andre Oppermann
87c3bd2755 According to RFC1812 we have to ignore ICMP redirects when we
are acting as router (ipforwarding enabled).

This doesn't fix the problem that host routes from ICMP redirects
are never removed from the kernel routing table but removes the
problem for machines doing packet forwarding.

Reviewed by:	sam (mentor)
2004-01-06 23:20:07 +00:00
Ruslan Ermilov
3b95e1346a Document the net.inet.ip.subnets_are_local sysctl. 2003-12-30 16:05:03 +00:00
Maxim Sobolev
73d7ddbc56 Sync with NetBSD:
if_gre.c rev.1.41-1.49

 o Spell output with two ts.
 o Remove assigned-to but not used variable.
 o fix grammatical error in a diagnostic message.
 o u_short -> u_int16_t.
 o gi_len is ip_len, so it has to be network byteorder.

if_gre.h rev.1.11-1.13

 o prototype must not have variable name.
 o u_short -> u_int16_t.
 o Spell address with two d's.

ip_gre.c rev.1.22-1.29

 o KNF - return is not a function.
 o The "osrc" variable in gre_mobile_input() is only ever set but not
   referenced; remove it.
 o correct (false) assumptions on mbuf chain.  not sure if it really helps, but
   anyways, it is necessary to perform m_pullup.
 o correct arg to m_pullup (need to count IP header size as well).
 o remove redundant adjustment of m->m_pkthdr.len.
 o clear m_flags just for safety.
 o tabify.
 o u_short -> u_int16_t.

MFC after:	2 weeks
2003-12-30 11:41:43 +00:00
Sam Leffler
437ffe1823 o eliminate widespread on-stack mbuf use for bpf by introducing
a new bpf_mtap2 routine that does the right thing for an mbuf
  and a variable-length chunk of data that should be prepended.
o while we're sweeping the drivers, use u_int32_t uniformly when
  when prepending the address family (several places were assuming
  sizeof(int) was 4)
o return M_ASSERTVALID to BPF_MTAP* now that all stack-allocated
  mbufs have been eliminated; this may better be moved to the bpf
  routines

Reviewed by:	arch@ and several others
2003-12-28 03:56:00 +00:00
Maxim Konovalov
fad1d65260 o Fix a comment: softticks lives in sys/kern/kern_timeout.c.
PR:		kern/60613
Submitted by:	Gleb Smirnoff
MFC after:	3 days
2003-12-27 14:08:53 +00:00
Hajimu UMEMOTO
8b8a0cef40 NULL is not 0.
Submitted by:	"Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net>
2003-12-24 18:22:04 +00:00
Ruslan Ermilov
3117579171 I didn't notice it right away, but check the right length too. 2003-12-23 14:08:50 +00:00
Ruslan Ermilov
78e2d2bd28 Fix a problem introduced in revision 1.84: m_pullup() does not
necessarily return the same mbuf chain so we need to recompute
mtod() consumers after pulling up.
2003-12-23 13:33:23 +00:00
Peter Wemm
a89ec05e3e Catch a few places where NULL (pointer) was used where 0 (integer) was
expected.
2003-12-23 02:36:43 +00:00
Sam Leffler
ededbec187 o move mutex init/destroy logic to the module load/unload hooks;
otherwise they are initialized twice when the code is statically
  configured in the kernel because the module load method gets
  invoked before the user application calls ip_mrouter_init
o add a mutex to synchronize the module init/done operations; this
  sort of was done using the value of ip_mroute but X_ip_mrouter_done
  sets it to NULL very early on which can lead to a race against
  ip_mrouter_init--using the additional mutex means this is safe now
o don't call ip_mrouter_reset from ip_mrouter_init; this now happens
  once at module load and X_ip_mrouter_done does the appropriate
  cleanup work to insure the data structures are in a consistent
  state so that a subsequent init operation inherits good state

Reviewed by:	juli
2003-12-20 18:32:48 +00:00
John Baldwin
a5b061f9d2 Fix some becuase -> because typos.
Reported by:	Marco Wertejuk <wertejuk@mwcis.com>
2003-12-17 16:12:01 +00:00
Robert Watson
2d92ec9858 Switch TCP over to using the inpcb label when responding in timed
wait, rather than the socket label.  This avoids reaching up to
the socket layer during connection close, which requires locking
changes.  To do this, introduce MAC Framework entry point
mac_create_mbuf_from_inpcb(), which is called from tcp_twrespond()
instead of calling mac_create_mbuf_from_socket() or
mac_create_mbuf_netlayer().  Introduce MAC Policy entry point
mpo_create_mbuf_from_inpcb(), and implementations for various
policies, which generally just copy label data from the inpcb to
the mbuf.  Assert the inpcb lock in the entry point since we
require consistency for the inpcb label reference.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-12-17 14:55:11 +00:00
Maxim Konovalov
1c86761b2a o IN_MULTICAST wants an address in host byte order.
PR:		kern/60304
Submitted by:	demon
MFC after:	1 week
2003-12-16 18:21:47 +00:00
Maksim Yevmenkin
a6a66f5c4c Do not panic when flushing dummynet firewall rules
Reviewed by: andre
Approved by: re (scottl)
2003-12-06 09:01:25 +00:00
Andre Oppermann
f5bd8e9aff Swap destination and source arguments of two bcopy() calls.
Before committing the initial tcp_hostcache I changed them from memcpy()
to conform with FreeBSD style without realizing the difference in argument
definition.

This fixes hostcache operation for IPv6 (in general and explicitly IPv6
path mtu discovery) and T/TCP (RFC1644).

Submitted by:	Taku YAMAMOTO <taku@cent.saitama-u.ac.jp>
Approved by:	re (rwatson)
2003-12-02 21:25:12 +00:00
Sam Leffler
d559f5c3d8 Include opt_ipsec.h so IPSEC/FAST_IPSEC is defined and the appropriate
code is compiled in to support the O_IPSEC operator.  Previously no
support was included and ipsec rules were always matching.  Note that
we do not return an error when an ipsec rule is added and the kernel
does not have IPsec support compiled in; this is done intentionally
but we may want to revisit this (document this in the man page).

PR:		58899
Submitted by:	Bjoern A. Zeeb
Approved by:	re (rwatson)
2003-12-02 00:23:45 +00:00
Andre Oppermann
cd6c4060c8 Fix an optimization where I made an ifdef'd out section to broad.
When the hostcache bucket limit is reached the last bucket wasn't
removed from the bucket row but inserted a few lines later at the
bucket row head again.  This leads to infinite loop when the same
bucket row is accessed the next time for a lookup/insert or purge
action.

Tested by:	imp, Matt Smith
Approved by:	re (rwatson)
2003-11-28 16:33:03 +00:00
Andre Oppermann
623f556031 Fix verify_rev_path() function. The author of this function tried to
cut corners which completely broke down when the routing table locking
was introduced.

Reviewed by:	sam (mentor)
Approved by:	re (rwatson)
2003-11-27 09:40:13 +00:00
Andre Oppermann
0cfbbe3bde Make sure all uses of stack allocated struct route's are properly
zeroed.  Doing a bzero on the entire struct route is not more
expensive than assigning NULL to ro.ro_rt and bzero of ro.ro_dst.

Reviewed by:	sam (mentor)
Approved by:	re  (scottl)
2003-11-26 20:31:13 +00:00
Sam Leffler
5bd311a566 Split the "inp" mutex class into separate classes for each of divert,
raw, tcp, udp, raw6, and udp6 sockets to avoid spurious witness
complaints.

Reviewed by:	rwatson
Approved by:	re (rwatson)
2003-11-26 01:40:44 +00:00
Andre Oppermann
943ae30252 Restructure a too broad ifdef which was disabling the setting of the
tcp flightsize sysctl value for local networks in the !INET6 case.

Approved by:	re (scottl)
2003-11-25 20:58:59 +00:00
Sam Leffler
6714d7c751 Correct a problem where ipfw-generated packets were being returned
for ipfw processing w/o an indication the packets were generated
by ipfw--and so should not be processed (this manifested itself
as a LOR.)  The flag bit in the mbuf that was used to mark the
packets was not listed in M_COPYFLAGS so if a packet had a header
prepended (as done by IPsec) the flag was lost.  Correct this by
defining a new M_PROTO6 flag and use it to mark packets that need
this processing.

Reviewed by:	bms
Approved by:	re (rwatson)
MFC after:	2 weeks
2003-11-24 03:57:03 +00:00
Sam Leffler
6a3ca7514d Use MPSAFE callouts only when debug.mpsafenet is 1. Both timer routines
potentially transmit packets that may enter KAME IPsec w/o Giant if the
callouts are marked MPSAFE.

Reviewed by:	ume
Approved by:	re (rwatson)
2003-11-23 18:13:41 +00:00
Thomas Moestl
1f831750b5 bzero() the the sockaddr used for the destination address for
rtalloc_ign() in in_pcbconnect_setup() before it is filled out.
Otherwise, stack junk would be left in sin_zero, which could
cause host routes to be ignored because they failed the comparison
in rn_match().
This should fix the wrong source address selection for connect() to
127.0.0.1, among other things.

Reviewed by:	sam
Approved by:	re (rwatson)
2003-11-23 03:02:00 +00:00
Andre Oppermann
97d8d152c2 Introduce tcp_hostcache and remove the tcp specific metrics from
the routing table.  Move all usage and references in the tcp stack
from the routing table metrics to the tcp hostcache.

It caches measured parameters of past tcp sessions to provide better
initial start values for following connections from or to the same
source or destination.  Depending on the network parameters to/from
the remote host this can lead to significant speedups for new tcp
connections after the first one because they inherit and shortcut
the learning curve.

tcp_hostcache is designed for multiple concurrent access in SMP
environments with high contention and is hash indexed by remote
ip address.

It removes significant locking requirements from the tcp stack with
regard to the routing table.

Reviewed by:	sam (mentor), bms
Reviewed by:	-net, -current, core@kame.net (IPv6 parts)
Approved by:	re (scottl)
2003-11-20 20:07:39 +00:00
Andre Oppermann
26d02ca7ba Remove RTF_PRCLONING from routing table and adjust users of it
accordingly.  The define is left intact for ABI compatibility
with userland.

This is a pre-step for the introduction of tcp_hostcache.  The
network stack remains fully useable with this change.

Reviewed by:	sam (mentor), bms
Reviewed by:	-net, -current, core@kame.net (IPv6 parts)
Approved by:	re (scottl)
2003-11-20 19:47:31 +00:00
Maxim Konovalov
dbf7b38125 Fix an arguments order in check_uidgid() call.
PR:		kern/59314
Submitted by:	Andrey V. Shytov
Approved by:	re (rwatson, jhb)
2003-11-20 10:28:33 +00:00
Robert Watson
a557af222b Introduce a MAC label reference in 'struct inpcb', which caches
the   MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols.  This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.

This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.

For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks.  Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.

Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.

Reviewed by:	sam, bms
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-11-18 00:39:07 +00:00
Olivier Houchard
8c8268cb4f In rip_abort(), unlock the inpcb if we didn't detach it, or we may
recurse on the lock before destroying the mutex.

Submitted by:	sam
2003-11-17 19:21:53 +00:00
Brian Feldman
633461295a Fix a few cases where MT_TAG-type "fake mbufs" are created on the stack, but
do not have mh_nextpkt initialized.  Somtimes what's there is "1", and the
ip_input() code pukes trying to m_free() it, rendering divert sockets and
such broken.
This really underscores the need to get rid of MT_TAG.

Reviewed by:	rwatson
2003-11-17 03:17:49 +00:00
Andre Oppermann
be7e82e44a Make two casts correct for all types of 64bit platforms.
Explained by:	bde
2003-11-16 12:50:33 +00:00
Andre Oppermann
df903fee84 Correct a cast to make it compile on 64bit platforms (noticed by tinderbox)
and remove two unneccessary variable initializations.
Make the introduction comment more clear with regard which parts of
the packet are touched.

Requested by:	luigi
2003-11-15 17:03:37 +00:00
Andre Oppermann
c76ff7084f Make ipstealth global as we need it in ip_fastforward too. 2003-11-15 01:45:56 +00:00
Andre Oppermann
02c1c7070e Remove the global one-level rtcache variable and associated
complex locking and rework ip_rtaddr() to do its own rtlookup.
Adopt all its callers to this and make ip_output() callable
with NULL rt pointer.

Reviewed by:	sam (mentor)
2003-11-14 21:48:57 +00:00
Andre Oppermann
9188b4a169 Introduce ip_fastforward and remove ip_flow.
Short description of ip_fastforward:

 o adds full direct process-to-completion IPv4 forwarding code
 o handles ip fragmentation incl. hw support (ip_flow did not)
 o sends icmp needfrag to source if DF is set (ip_flow did not)
 o supports ipfw and ipfilter (ip_flow did not)
 o supports divert, ipfw fwd and ipfilter nat (ip_flow did not)
 o returns anything it can't handle back to normal ip_input

Enable with sysctl -w net.inet.ip.fastforwarding=1

Reviewed by:	sam (mentor)
2003-11-14 21:02:22 +00:00
Sam Leffler
f7bbe2c0f1 add missing inpcb lock before call to tcp_twclose (which reclaims the inpcb)
Supported by:	FreeBSD Foundation
2003-11-13 05:18:23 +00:00
Sam Leffler
1b73ca0bf1 o reorder some locking asserts to reflect the order of the locks
o correct a read-lock assert in in_pcblookup_local that should be
  a write-lock assert (since time wait close cleanups may alter state)

Supported by:	FreeBSD Foundation
2003-11-13 05:16:56 +00:00
Andre Oppermann
16d6c90f5d Move global variables for icmp_input() to its stack. With SMP or
preemption two CPUs can be in the same function at the same time
and clobber each others variables.  Remove register declaration
from local variables.

Reviewed by:	sam (mentor)
2003-11-13 00:32:13 +00:00
Andre Oppermann
2683ceb661 Do not fragment a packet with hardware assistance if it has the DF
bit set.

Reviewed by:	sam (mentor)
2003-11-12 23:35:40 +00:00
Bruce M Simpson
83453a06de Add a new sysctl knob, net.inet.udp.strict_mcast_mship, to the udp_input path.
This switch toggles between strict multicast delivery, and traditional
multicast delivery.

The traditional (default) behaviour is to deliver multicast datagrams to all
sockets which are members of that group, regardless of the network interface
where the datagrams were received.

The strict behaviour is to deliver multicast datagrams received on a
particular interface only to sockets whose membership is bound to that
interface.

Note that as a matter of course, multicast consumers specifying INADDR_ANY
for their interface get joined on the interface where the default route
happens to be bound. This switch has no effect if the interface which the
consumer specifies for IP_ADD_MEMBERSHIP is not UP and RUNNING.

The original patch has been cleaned up somewhat from that submitted. It has
been tested on a multihomed machine with multiple QuickTime RTP streams
running over the local switch, which doesn't do IGMP snooping.

PR:		kern/58359
Submitted by:	William A. Carrel
Reviewed by:	rwatson
MFC after:	1 week
2003-11-12 20:17:11 +00:00
Andre Oppermann
122aad88d5 dropwithreset is not needed in this case as tcp_drop() is already notifying
the other side. Before we were sending two RST packets.
2003-11-12 19:38:01 +00:00
Robert Watson
eca8a663d4 Modify the MAC Framework so that instead of embedding a (struct label)
in various kernel objects to represent security data, we embed a
(struct label *) pointer, which now references labels allocated using
a UMA zone (mac_label.c).  This allows the size and shape of struct
label to be varied without changing the size and shape of these kernel
objects, which become part of the frozen ABI with 5-STABLE.  This opens
the door for boot-time selection of the number of label slots, and hence
changes to the bound on the number of simultaneous labeled policies
at boot-time instead of compile-time.  This also makes it easier to
embed label references in new objects as required for locking/caching
with fine-grained network stack locking, such as inpcb structures.

This change also moves us further in the direction of hiding the
structure of kernel objects from MAC policy modules, not to mention
dramatically reducing the number of '&' symbols appearing in both the
MAC Framework and MAC policy modules, and improving readability.

While this results in minimal performance change with MAC enabled, it
will observably shrink the size of a number of critical kernel data
structures for the !MAC case, and should have a small (but measurable)
performance benefit (i.e., struct vnode, struct socket) do to memory
conservation and reduced cost of zeroing memory.

NOTE: Users of MAC must recompile their kernel and all MAC modules as a
result of this change.  Because this is an API change, third party
MAC modules will also need to be updated to make less use of the '&'
symbol.

Suggestions from:	bmilekic
Obtained from:		TrustedBSD Project
Sponsored by:		DARPA, Network Associates Laboratories
2003-11-12 03:14:31 +00:00
Sam Leffler
a0bf1601a7 correct typos
Pointed out by:	Mike Silbersack
2003-11-11 18:16:54 +00:00
Sam Leffler
3d0b255a9a o add missing inpcb locking in tcp_respond
o replace spl's with lock assertions

Supported by:	FreeBSD Foundation
2003-11-11 17:54:47 +00:00
Sam Leffler
383df78dc8 use Giant-less callouts when debug_mpsafenet is non-zero
Supported by:	FreeBSD Foundation
2003-11-10 23:29:33 +00:00
Ian Dowse
3ab2096b80 In in_pcbconnect_setup(), don't use the cached inp->inp_route unless
it is marked as RTF_UP. This appears to fix a crash that was sometimes
triggered when dhclient(8) tried to send a packet after an interface
had been detatched.

Reviewed by:	sam
2003-11-10 22:45:37 +00:00
Jeffrey Hsu
1ce43e2348 Mark TCP syncache timer as not Giant-free ready yet. 2003-11-10 20:42:04 +00:00
Sam Leffler
7138d65c3f replace explicit changes to rt_refcnt by RT_ADDREF and RT_REMREF
macros that expand to include assertions when the system is built
with INVARIANTS

Supported by:	FreeBSD Foundation
2003-11-08 23:36:32 +00:00
Sam Leffler
252f24a2cf divert socket fixups:
o pickup Giant in divert_packet to protect sbappendaddr since it
  can be entered through MPSAFE callouts or through ip_input when
  mpsafenet is 1
o add missing locking on output
o add locking to abort and shutdown
o add a ctlinput handler to invalidate held routing table references
  on an ICMP redirect (may not be needed)

Supported by:	FreeBSD Foundation
2003-11-08 23:09:42 +00:00
Sam Leffler
8484384564 assert optional inpcb is passed in locked
Supported by:	FreeBSD Foundation
2003-11-08 23:03:29 +00:00
Sam Leffler
59daba27d9 add locking assertions
Supported by:	FreeBSD Foundation
2003-11-08 23:02:36 +00:00
Sam Leffler
3c47a187b7 assert inpcb is locked in udp_output
Supported by:	FreeBSD Foundation
2003-11-08 23:00:48 +00:00
Sam Leffler
c29afad673 o correct locking problem: the inpcb must be held across tcp_respond
o add assertions in tcp_respond to validate inpcb locking assumptions
o use local variable instead of chasing pointers in tcp_respond

Supported by:	FreeBSD Foundation
2003-11-08 22:59:22 +00:00
Sam Leffler
2a0746208b use local values instead of chasing pointers
Supported by:	FreeBSD Foundation
2003-11-08 22:57:13 +00:00
Sam Leffler
fa286d7db2 replace mtx_assert by INP_LOCK_ASSERT
Supported by:	FreeBSD Foundation
2003-11-08 22:55:52 +00:00
Sam Leffler
50d7c061a3 add some missing locking
Supported by:	FreeBSD Foundation
2003-11-08 22:53:41 +00:00
Sam Leffler
1d78192b35 the sbappendaddr call in socket_send must be protected by Giant
because it can happen from an MPSAFE callout

Supported by:	FreeBSD Foundation
2003-11-08 22:51:18 +00:00
Sam Leffler
e3f268fc89 add locking assertions that turn into noops if INET6 is configured;
this is necessary because the ipv6 code shares the in_pcb code with
ipv4 but (presently) lacks proper locking

Supported by:	FreeBSD Foundation
2003-11-08 22:48:27 +00:00
Sam Leffler
7902224c6b o add a flags parameter to netisr_register that is used to specify
whether or not the isr needs to hold Giant when running; Giant-less
  operation is also controlled by the setting of debug_mpsafenet
o mark all netisr's except NETISR_IP as needing Giant
o add a GIANT_REQUIRED assertion to the top of netisr's that need Giant
o pickup Giant (when debug_mpsafenet is 1) inside ip_input before
  calling up with a packet
o change netisr handling so swi_net runs w/o Giant; instead we grab
  Giant before invoking handlers based on whether the handler needs Giant
o change netisr handling so that netisr's that are marked MPSAFE may
  have multiple instances active at a time
o add netisr statistics for packets dropped because the isr is inactive

Supported by:	FreeBSD Foundation
2003-11-08 22:28:40 +00:00
Sam Leffler
27a940c9a2 unbreak compilation of FAST_IPSEC
Supported by:	FreeBSD Foundation
2003-11-08 00:34:34 +00:00
Sam Leffler
aab621f060 MFp4: reminder that random id code is not reentrant
Supported by:	FreeBSD Foundation
2003-11-07 23:31:29 +00:00
Sam Leffler
8f1ee3683d Move uid/gid checking logic out of line and lock inpcb usage. This
has a LOR between IPFW inpcb locks but I'm committing it now as the
lesser of two evils (the other being unlocked use of in_pcblookup).

Supported by:	FreeBSD Foundation
2003-11-07 23:26:57 +00:00
Hajimu UMEMOTO
aef3a65eb7 use ipsec_getnhist() instead of obsoleted ipsec_gethist().
Submitted by:	"Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net>
Reviewed by:	Ari Suutari <ari@suutari.iki.fi> (ipfw@)
2003-11-07 20:25:47 +00:00
Sam Leffler
ad67584665 Fix locking of the ip forwarding cache. We were holding a reference
to a routing table entry w/o bumping the reference count or locking
against the entry being free'd.  This caused major havoc (for some
reason it appeared most frequently for folks running natd).  Fix
is to bump the reference count whenever we copy the route cache
contents into a private copy so the entry cannot be reclaimed out
from under us.  This is a short term fix as the forthcoming routing
table changes will eliminate this cache entirely.

Supported by:	FreeBSD Foundation
2003-11-07 01:47:52 +00:00
Hajimu UMEMOTO
0f9ade718d - cleanup SP refcnt issue.
- share policy-on-socket for listening socket.
- don't copy policy-on-socket at all.  secpolicy no longer contain
  spidx, which saves a lot of memory.
- deep-copy pcb policy if it is an ipsec policy.  assign ID field to
  all SPD entries.  make it possible for racoon to grab SPD entry on
  pcb.
- fixed the order of searching SA table for packets.
- fixed to get a security association header.  a mode is always needed
  to compare them.
- fixed that the incorrect time was set to
  sadb_comb_{hard|soft}_usetime.
- disallow port spec for tunnel mode policy (as we don't reassemble).
- an user can define a policy-id.
- clear enc/auth key before freeing.
- fixed that the kernel crashed when key_spdacquire() was called
  because key_spdacquire() had been implemented imcopletely.
- preparation for 64bit sequence number.
- maintain ordered list of SA, based on SA id.
- cleanup secasvar management; refcnt is key.c responsibility;
  alloc/free is keydb.c responsibility.
- cleanup, avoid double-loop.
- use hash for spi-based lookup.
- mark persistent SP "persistent".
  XXX in theory refcnt should do the right thing, however, we have
  "spdflush" which would touch all SPs.  another solution would be to
  de-register persistent SPs from sptree.
- u_short -> u_int16_t
- reduce kernel stack usage by auto variable secasindex.
- clarify function name confusion.  ipsec_*_policy ->
  ipsec_*_pcbpolicy.
- avoid variable name confusion.
  (struct inpcbpolicy *)pcb_sp, spp (struct secpolicy **), sp (struct
  secpolicy *)
- count number of ipsec encapsulations on ipsec4_output, so that we
  can tell ip_output() how to handle the packet further.
- When the value of the ul_proto is ICMP or ICMPV6, the port field in
  "src" of the spidx specifies ICMP type, and the port field in "dst"
  of the spidx specifies ICMP code.
- avoid from applying IPsec transport mode to the packets when the
  kernel forwards the packets.

Tested by:	nork
Obtained from:	KAME
2003-11-04 16:02:05 +00:00
Robert Watson
3de758d3e3 Note that when ip_output() is called from ip_forward(), it will already
have its options inserted, so the opt argument to ip_output()  must be
NULL.
2003-11-03 18:03:05 +00:00
Robert Watson
eecfe773aa Remove comment about desire for eventual explicit labeling of ICMP
header copy made on input path: this is now handled differently.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-11-03 18:01:38 +00:00
Sam Leffler
04df2fbbb8 Remove bogus RTFREE that was added in rev 1.47. The rmx code operates
directly on the radix tree and does not hold any routing table refernces.
This fixes the reference counting problems that manifested itself as a
panic during unmount of filesystems that were mounted by NFS over an
interface that had been removed.

Supported by:	FreeBSD Foundation
2003-11-03 06:11:44 +00:00
Sam Leffler
9ce7877897 Correct rev 1.56 which (incorrectly) reversed the test used to
decide if in_pcbpurgeif0 should be invoked.

Supported by:	FreeBSD Foundation
2003-11-03 03:22:39 +00:00
Mike Silbersack
4bd4fa3fe6 Add an additional check to the tcp_twrecycleable function; I had
previously only considered the send sequence space.  Unfortunately,
some OSes (windows) still use a random positive increments scheme for
their syn-ack ISNs, so I must consider receive sequence space as well.

The value of 250000 bytes / second for Microsoft's ISN rate of increase
was determined by testing with an XP machine.
2003-11-02 07:47:03 +00:00
Mike Silbersack
96af9ea52b - Add a new function tcp_twrecycleable, which tells us if the ISN which
we will generate for a given ip/port tuple has advanced far enough
for the time_wait socket in question to be safely recycled.

- Have in_pcblookup_local use tcp_twrecycleable to determine if
time_Wait sockets which are hogging local ports can be safely
freed.

This change preserves proper TIME_WAIT behavior under normal
circumstances while allowing for safe and fast recycling whenever
ephemeral port space is scarce.
2003-11-01 07:30:08 +00:00
Brooks Davis
9bf40ede4a Replace the if_name and if_unit members of struct ifnet with new members
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.

This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.

Approved By:	re (in principle)
Reviewed By:	njl, imp
Tested On:	i386, amd64, sparc64
Obtained From:	NetBSD (if_xname)
2003-10-31 18:32:15 +00:00
Sam Leffler
9c63e9dbd7 Overhaul routing table entry cleanup by introducing a new rtexpunge
routine that takes a locked routing table reference and removes all
references to the entry in the various data structures. This
eliminates instances of recursive locking and also closes races
where the lock on the entry had to be dropped prior to calling
rtrequest(RTM_DELETE).  This also cleans up confusion where the
caller held a reference to an entry that might have been reclaimed
(and in some cases used that reference).

Supported by:	FreeBSD Foundation
2003-10-30 23:02:51 +00:00
Sam Leffler
d0402f1b73 Potential fix for races shutting down callouts when unloading
the module.  Previously we grabbed the mutex used by the callouts,
then stopped the callout with callout_stop, but if the callout
was already active and blocked by the mutex then it would continue
later and reference the mutex after it was destroyed.  Instead
stop the callout first then lock.

Supported by:	FreeBSD Foundation
2003-10-29 19:15:00 +00:00
Sam Leffler
3520e9d61d o add locking to protect routing table refcnt manipulations
o add some more debugging help for figuring out why folks are
  getting complaints about releasing routing table entries with
  a zero refcnt
o fix comment that talked about spl's
o remove duplicate define of DUMMYNET_DEBUG

Supported by:	FreeBSD Foundation
2003-10-29 19:03:58 +00:00
Hajimu UMEMOTO
59dfcba4aa add ECN support in layer-3.
- implement the tunnel egress rule in ip_ecn_egress() in ip_ecn.c.
   make ip{,6}_ecn_egress() return integer to tell the caller that
   this packet should be dropped.
 - handle ECN at fragment reassembly in ip_input.c and frag6.c.

Obtained from:	KAME
2003-10-29 15:07:04 +00:00
Hajimu UMEMOTO
11de19f44d ip6_savecontrol() argument is redundant 2003-10-29 12:52:28 +00:00
Sam Leffler
9c855a36c1 Introduce the notion of "persistent mbuf tags"; these are tags that stay
with an mbuf until it is reclaimed.  This is in contrast to tags that
vanish when an mbuf chain passes through an interface.  Persistent tags
are used, for example, by MAC labels.

Add an m_tag_delete_nonpersistent function to strip non-persistent tags
from mbufs and use it to strip such tags from packets as they pass through
the loopback interface and when turned around by icmp.  This fixes problems
with "tag leakage".

Pointed out by:	Jonathan Stone
Reviewed by:	Robert Watson
2003-10-29 05:40:07 +00:00
Sam Leffler
395bb18680 speedup stream socket recv handling by tracking the tail of
the mbuf chain instead of walking the list for each append

Submitted by:	ps/jayanth
Obtained from:	netbsd (jason thorpe)
2003-10-28 05:47:40 +00:00
Hajimu UMEMOTO
618d51bbdc revert following unwanted changes:
- __packed to __attribute__((__packed__)
  -  uintN_t back to u_intN_t

Reported by:	bde
2003-10-25 10:57:08 +00:00
Hajimu UMEMOTO
16cd67e933 correct namespace pollution.
Submitted by:	bde
2003-10-25 09:37:10 +00:00
Hajimu UMEMOTO
c302f5bc07 remove the ip6r0_addr and ip6r0_slmap members from ip6_rthdr0{}
according to rfc2292bis.

Obtained from:	KAME
2003-10-24 20:37:05 +00:00
Hajimu UMEMOTO
5434eaa208 correct tab and order. 2003-10-24 19:51:49 +00:00
Hajimu UMEMOTO
f95d46333d Switch Advanced Sockets API for IPv6 from RFC2292 to RFC3542
(aka RFC2292bis).  Though I believe this commit doesn't break
backward compatibility againt existing binaries, it breaks
backward compatibility of API.
Now, the applications which use Advanced Sockets API such as
telnet, ping6, mld6query and traceroute6 use RFC3542 API.

Obtained from:	KAME
2003-10-24 18:26:30 +00:00
Mike Silbersack
0709c23335 Reduce the number of tcp time_wait structs to maxsockets / 5; this ensures
that at most 20% of sockets can be in time_wait at one time, ensuring
that time_wait sockets do not starve real connections from inpcb
structures.

No implementation change is needed, jlemon already implemented a nice
LRU-ish algorithm for tcp_tw structure recycling.

This should reduce the need for sysadmins to lower the default msl on
busy servers.
2003-10-24 05:44:14 +00:00
Sam Leffler
ac6b0748be o restructure initialization code so data structures are setup
when loaded as a module
o cleanup data structures on module unload when no application has
  been started (i.e. kldload, kldunload w/o mrtd)
o remove extraneous unlocks immediately prior to destroying them

Supported by:	FreeBSD Foundation
2003-10-24 00:09:18 +00:00
Mike Silbersack
184dcdc7c8 Change all SYSCTLS which are readonly and have a related TUNABLE
from CTLFLAG_RD to CTLFLAG_RDTUN so that sysctl(8) can provide
more useful error messages.
2003-10-21 18:28:36 +00:00
Hajimu UMEMOTO
b339980338 enclose IPv6 part with ifdef INET6.
Obtained from:	KAME
2003-10-20 16:19:01 +00:00
Hajimu UMEMOTO
31b3783c8d correct linkmtu handling.
Obtained from:	KAME
2003-10-20 15:27:48 +00:00
Hajimu UMEMOTO
31b1bfe1b0 - add dom_if{attach,detach} framework.
- transition to use ifp->if_afdata.

Obtained from:	KAME
2003-10-17 15:46:31 +00:00
Sam Leffler
f51f805f7e pfil hooks can modify packet contents so check if the destination
address has been changed when PFIL_HOOKS is enabled and, if it has,
arrange for the proper action by ip*_forward.

Supported by:	FreeBSD Foundation
Submitted by:	Pyun YongHyeon
2003-10-16 16:25:25 +00:00
Sam Leffler
b15694110f Drop dummynet lock when calling back into the network stack to deliver
packets.  This eliminates a LOR with Giant that caused outbound pipes
to fail.

Supported by:	FreeBSD Foundation
2003-10-16 16:21:25 +00:00
Kirk McKusick
b03587f06a Malloc buckets of size 128 have been having their 64-byte offset
trashed after being freed. This has caused several panics including
kern/42277 related to soft updates. Jim Kuhn tracked the problem
down to ipfw limit rule processing.  In the expiry of dynamic rules,
it is possible for an O_LIMIT_PARENT rule to be removed when it still
has live children.  When the children eventually do expire, a pointer
to the (long gone) parent is dereferenced and a count decremented.
Since this memory can, and is, allocated for other purposes (in the
case of kern/42277 an inodedep structure), chaos ensues. The offset
in question in inodedep is the offset of the 16 bit count field in
the ipfw2 ipfw_dyn_rule.

Submitted by:	Jim Kuhn <jkuhn@sandvine.com>
Reviewed by:	"Evgueni V. Gavrilov" <aquatique@rusunix.org>
Reviewed by:	Ben Pfountz <netprince@vt.edu>
MFC after:	1 week
2003-10-16 02:00:12 +00:00
Sam Leffler
b35a1e5d66 purge extraneous ';'s
Supported by:	FreeBSD Foundation
Noticed by:	bde
2003-10-15 18:19:28 +00:00
Sam Leffler
929b31ddab Lock ip forwarding route cache. While we're at it, remove the global
variable ipforward_rt by introducing an ip_forward_cacheinval() call
to use to invalidate the cache.

Supported by:	FreeBSD Foundation
2003-10-14 19:19:12 +00:00
Sam Leffler
888c2a3c4e remove dangling ';'s` that were harmless
Supported by:	FreeBSD Foundation
2003-10-14 18:45:50 +00:00
Hajimu UMEMOTO
06cd0a3f97 - fix typo in comment.
- style.

Obtained from:	KAME
2003-10-07 17:46:18 +00:00
Hajimu UMEMOTO
1ae02d474a nuke unused ICMPV6CTL_NAMES and KEYCTL_NAMES macros. 2003-10-07 15:14:33 +00:00
Hajimu UMEMOTO
8c99329e89 return(code) -> return (code)
Obtained from:	KAME
2003-10-07 15:02:29 +00:00
Sam Leffler
d1dd20be6e Locking for updates to routing table entries. Each rtentry gets a mutex
that covers updates to the contents.  Note this is separate from holding
a reference and/or locking the routing table itself.

Other/related changes:

o rtredirect loses the final parameter by which an rtentry reference
  may be returned; this was never used and added unwarranted complexity
  for locking.
o minor style cleanups to routing code (e.g. ansi-fy function decls)
o remove the logic to bump the refcnt on the parent of cloned routes,
  we assume the parent will remain as long as the clone; doing this avoids
  a circularity in locking during delete
o convert some timeouts to MPSAFE callouts

Notes:

1. rt_mtx in struct rtentry is guarded by #ifdef _KERNEL as user-level
   applications cannot/do-no know about mutex's.  Doing this requires
   that the mutex be the last element in the structure.  A better solution
   is to introduce an externalized version of struct rtentry but this is
   a major task because of the intertwining of rtentry and other data
   structures that are visible to user applications.
2. There are known LOR's that are expected to go away with forthcoming
   work to eliminate many held references.  If not these will be resolved
   prior to release.
3. ATM changes are untested.

Sponsored by:	FreeBSD Foundation
Obtained from:	BSD/OS (partly)
2003-10-04 03:44:50 +00:00
Sam Leffler
87002f0dc1 hookup ctlinput for fast ipsec versions of esp+ah protocols
Supported by:	FreeBSD Foundation
2003-10-03 22:06:36 +00:00
Sam Leffler
12394d06d8 place some kernel-specific data structures under #ifdef _KERNEL
Sponsored by:	FreeBSD Foundation
2003-10-03 20:58:56 +00:00
Bruce M Simpson
c3b52d6499 Shorten 'bad gateway' AF_LINK message.
Submitted by:	green
2003-10-03 17:22:14 +00:00
Bruce M Simpson
beb2ced8ac Make arp_rtrequest()'s 'bad gateway' messages slightly more informative,
to aid me in tracking down LLINFO inconsistencies in the routing table.

Discussed with:	fenner
2003-10-03 17:21:17 +00:00
Bruce M Simpson
b75bead1f2 Only delete the route if arplookup() tried to create it. Do not delete
RTF_STATIC routes. Do not check for RTF_HOST so as to avoid being DoSed
when an RTF_GENMASK route exists in the table.

Add a more verbose comment about exactly what this code does.

Submitted by:	ru
2003-10-03 09:19:23 +00:00
Ruslan Ermilov
deb62e2887 By popular demand, added the "static ARP" per-interface option. 2003-10-01 08:32:37 +00:00
Hajimu UMEMOTO
5c6ebad8f6 add /*CONSTCOND*/ to reduce diffs against latest KAME.
Obtained from:	KAME
2003-09-25 13:40:06 +00:00
Bruce M Simpson
85cc199400 Fix a logic error in the check to see if arplookup() should free the route.
Noticed by:	Mike Hogsett
Reviewed by:	ru
2003-09-24 20:52:25 +00:00
Sam Leffler
134ea22494 o update PFIL_HOOKS support to current API used by netbsd
o revamp IPv4+IPv6+bridge usage to match API changes
o remove pfil_head instances from protosw entries (no longer used)
o add locking
o bump FreeBSD version for 3rd party modules

Heavy lifting by:	"Max Laier" <max@love2party.net>
Supported by:		FreeBSD Foundation
Obtained from:		NetBSD (bits of pfil.h and pfil.c)
2003-09-23 17:54:04 +00:00
Bruce M Simpson
fedf1d01a2 Fix a bug in arplookup(), whereby a hostile party on a locally
attached network could exhaust kernel memory, and cause a system
panic, by sending a flood of spoofed ARP requests.

Approved by:	jake (mentor)
Reported by:	Apple Product Security <product-security@apple.com>
2003-09-23 16:39:31 +00:00
Joe Marcus Clarke
68f1756b2a Grrr...add the Skinny alias code forgotten in the last commit. 2003-09-23 07:42:33 +00:00
Joe Marcus Clarke
b07fbc17e9 Add Cisco Skinny Station protocol support to libalias, natd, and ppp.
Skinny is the protocol used by Cisco IP phones to talk to Cisco Call
Managers.  With this code, one can use a Cisco IP phone behind a FreeBSD
NAT gateway.

Currently, having the Call Manager behind the NAT gateway is not supported.
More information on enabling Skinny support in libalias, natd, and ppp
can be found in those applications' manpages.

PR:		55843
Reviewed by:	ru
Approved by:	ru
MFC after:	30 days
2003-09-23 07:41:55 +00:00
Sam Leffler
598345da4b Bandaid locking change: mark static rule mutex recursive so re-entry when
sending an ICMP packet doesn't cause a panic.  A better solution is needed;
possibly defering the transmit to a dedicated thread.

Observed by:	"Aaron Wohl" <freebsd@soith.com>
2003-09-17 22:06:47 +00:00
Sam Leffler
f34f3a7097 shuffle code so we don't "continue" and miss a needed unlock operation
Observed by:	Wiktor Niesiobedzki <w@evip.pl>
2003-09-17 21:13:16 +00:00
Sam Leffler
293941a556 Add locking.
o change timeout to MPSAFE callout
o restructure rule deletion to deal with locking requirements
o replace static buffer used for ipfw control operations with malloc'd storage

Sponsored by:	FreeBSD Foundation
2003-09-17 00:56:50 +00:00
Sam Leffler
91176902bc Minor fixups + add locking.
o change time to MPSAFE callout
o make debug printfs conditional on DUMMYNET_DEBUG and runtime controllable
  by net.inet.ip.dummynet.debug
o make boot-time printf dependent on bootverbose

Sponsored by:	FreeBSD Foundation
2003-09-17 00:54:04 +00:00
Ruslan Ermilov
78f94aa951 Fix a bunch of off-by-one errors in the range checking code. 2003-09-11 21:40:21 +00:00
Ruslan Ermilov
8e75a37bb0 Fixed -Wpointer-arith warning.
Submitted by:	Stefan Farfeleder
PR:		bin/56653
2003-09-09 23:50:57 +00:00
Ruslan Ermilov
fe08efe680 mdoc(7): Use the new feature of the .In macro. 2003-09-08 19:57:22 +00:00
Sam Leffler
468cf6f61a Add locking.
Special thanks to Pavlin Radoslavov <pavlin@icir.org> for testing and
fixing numerous problems.

Sponsored by:	FreeBSD Foundation
Reviewed by:	Pavlin Radoslavov <pavlin@icir.org>
2003-09-06 04:53:43 +00:00
Sam Leffler
2fad1e931e lock ip fragment queues
Submitted by:	Robert Watson <rwatson@freebsd.org>
Obtained from:	BSD/OS
2003-09-05 00:10:33 +00:00
Sam Leffler
26f91065e7 o add locking
o move the global divsrc socket address to a local variable
  instead of locking it

Sponsored by:	FreeBSD Foundation
2003-09-05 00:00:51 +00:00
Bruce M Simpson
8a538743b5 PR: kern/56343
Reviewed by:	tjr
Approved by:	jake (mentor)
2003-09-03 02:19:29 +00:00
Mike Silbersack
3390d47670 Implement MBUF_STRESS_TEST mark II.
Changes from the original implementation:

- Fragmentation is handled by the function m_fragment, which can
be called from whereever fragmentation is needed.  Note that this
function is wrapped in #ifdef MBUF_STRESS_TEST to discourage non-testing
use.

- m_fragment works slightly differently from the old fragmentation
code in that it allocates a seperate mbuf cluster for each fragment.
This defeats dma_map_load_mbuf/buffer's feature of coalescing adjacent
fragments.  While that is a nice feature in practice, it nerfed the
usefulness of mbuf_stress_test.

- Add two modes of random fragmentation.  Chains with fragments all of
the same random length and chains with fragments that are each uniquely
random in length may now be requested.
2003-09-01 05:55:37 +00:00
Sam Leffler
638ed548b7 add locking
NB: There is a known LOR on the forwarding path; this needs to be resolved
    together with a similar issue in the bridge.  For the moment it is
    believed to be benign.

Sponsored by:	FreeBSD Fondation
2003-09-01 05:12:36 +00:00
Sam Leffler
611ceef62a remove warning about use of old divert sockets; this was marked
for removal before 5.2

Reviewed by:	silence on -net and -arch
2003-09-01 04:27:34 +00:00
Sam Leffler
3b6dd5a9d0 add locking
Sponsored by:	FreeBSD Foundation
2003-09-01 04:23:48 +00:00
Robert Watson
f19389746e Remove redundant initialization of rti; SLIST_FOREACH does that for
us.
2003-08-28 22:15:05 +00:00
Robert Watson
6b48911b00 M_PREPEND() with an argument of M_TRYWAIT can fail, meaning the
returned mbuf can be NULL.  Check for NULL in rip_output() when
prepending an IP header.  This prevents mbuf exhaustion from
causing a local kernel panic when sending raw IP packets.

PR:		kern/55886
Reported by:	Pawel Malachowski <pawmal-posting@freebsd.lublin.pl>
MFC after:	3 days
2003-08-26 14:11:48 +00:00
Jeffrey Hsu
578c5e1212 Remove redundant bzero.
Submitted by:	Pavlin Radoslavov <pavlin@icir.org>
2003-08-24 08:27:57 +00:00
Robert Watson
baee0c3e66 Introduce two new MAC Framework and MAC policy entry points:
mac_reflect_mbuf_icmp()
  mac_reflect_mbuf_tcp()

These entry points permit MAC policies to do "update in place"
changes to the labels on ICMP and TCP mbuf headers when an ICMP or
TCP response is generated to a packet outside of the context of
an existing socket.  For example, in respond to a ping or a RST
packet to a SYN on a closed port.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-08-21 18:39:16 +00:00
Robert Watson
b8ecbcd287 Before digging into IGMP locking, do a whitespace and prototype cleanup:
prefer tabs to 8 spaces, focus on consistent indentation, prefer modern
C function prototypes.  Not all the way to style(9), but substantially
closer.
2003-08-20 17:32:17 +00:00
Robert Watson
6c4b2ad305 Move from a custom-crafted singly-linked list to the SLIST_* macros
from queue(3).

Improve vertical compactness by using a IGMP_PRINTF() macro rather
than #ifdefing IGMP_DEBUG a large number of debugging printfs.

Reviewed by:	mdodd (SLIST changes)
2003-08-20 17:09:01 +00:00
Bruce M Simpson
8afa230470 Add the IP_ONESBCAST option, to enable undirected IP broadcasts to be sent on
specific interfaces. This is required by aodvd, and may in future help us
in getting rid of the requirement for BPF from our import of isc-dhcp.

Suggested by:   fenestro
Obtained from:  BSD/OS
Reviewed by:    mini, sam
Approved by:    jake (mentor)
2003-08-20 14:46:40 +00:00
Sam Leffler
c06eb4e293 Change instances of callout_init that specify MPSAFE behaviour to
use CALLOUT_MPSAFE instead of "1" for the second parameter.  This
does not change the behaviour; it just makes the intent more clear.
2003-08-19 17:51:11 +00:00
Jeffrey Hsu
9ba208b413 * Bug fix in bw_meter_process(): the periodically processed bins
of bw_meter entries were processed up to one second ahead.
  After an unappropriate rescheduling of some of the bw_meter
  entries, the upcalls weren't delivered.

* pim_register_prepare() uses the appropriate sw_csum flag to
  call ip_fragment() so the IP checksum is computed properly.

* Modify pim_register_prepare() to take care of IP packets that
  don't need fragmentation.

* Add-back in_delayed_cksum() to encap_send(), because it seems it
  should be there.

Submitted by:	Pavlin Radoslavov <pavlin@icir.org>
2003-08-19 17:22:51 +00:00
Sam Leffler
53b57cd1ab add missing unlock when in_pcballoc returns an error 2003-08-19 17:11:46 +00:00
David E. O'Brien
4f4a104ee8 style.Makefile(5) 2003-08-18 15:25:39 +00:00
Gordon Tetlow
41d8423f71 Stage 3 of dynamic root support. Make all the libraries needed to run
binaries in /bin and /sbin installed in /lib. Only the versioned files
reside in /lib, the .so symlink continues to live /usr/lib so the
toolchain doesn't need to be modified.
2003-08-17 08:28:46 +00:00
Hartmut Brandt
a9ca5bdbd0 The syncache has made use of TCPDEBUG problematic, because the SYN
segments are lost for the application. This broke, for example,
ports/benchmarks/dbs which needs the SYN segment to filter the
contents of the trace buffer for the connection it is interested in.

This patch makes the SYN segments available again. Unfortunately they
are now associated with the listening socket instead of the new one, so
a change to applications is required, but without this patch it wouldn't
work altogether.

PR:		kern/45966
2003-08-13 10:20:57 +00:00
Hartmut Brandt
91f467d592 The tcp_trace call needs the length of the header. Unfortunately the
code has rotten a bit so that the header length is not correct at
the point when tcp_trace is called. Temporarily compute the correct
value before the call and restore the old value after. This makes
ports/benchmarks/dbs to almost work.

This is a NOP unless you compile with TCPDEBUG.
2003-08-13 08:50:42 +00:00
Hartmut Brandt
3c653157a5 A number of patches in the last years have created new return paths
in tcp_input that leave the function before hitting the tcp_trace
function call for the TCPDEBUG option. This has made TCPDEBUG mostly
useless (and tools like ports/benchmarks/dbs not working). Add
tcp_trace calls to the return paths that could be identified in this
maze.

This is a NOP unless you compile with TCPDEBUG.
2003-08-13 08:46:54 +00:00
Hartmut Brandt
b24521d779 Change the code that enables/disables the ATM channel to use the
new ATMIOCOPENVCC/CLOSEVCC. This allows us to not only use UBR channels
for IP over ATM, but also CBR, VBR and ABR. Change the format of the
link layer address to specify the channel characteristics. The old
format is still supported and opens UBR channels.
2003-08-12 14:20:32 +00:00
Jeffrey Hsu
59ca77f4a1 New PIM header files.
Submitted by:	Pavlin Radoslavov <pavlin@icir.org>
2003-08-07 18:17:43 +00:00
Jeffrey Hsu
1e78ac216e 1. Basic PIM kernel support
Disabled by default. To enable it, the new "options PIM" must be
added to the kernel configuration file (in addition to MROUTING):

options	MROUTING		# Multicast routing
options	PIM			# Protocol Independent Multicast

2. Add support for advanced multicast API setup/configuration and
extensibility.

3. Add support for kernel-level PIM Register encapsulation.
Disabled by default.  Can be enabled by the advanced multicast API.

4. Implement a mechanism for "multicast bandwidth monitoring and upcalls".

Submitted by:	Pavlin Radoslavov <pavlin@icir.org>
2003-08-07 18:16:59 +00:00
John Baldwin
8b149b5131 Consistently use the BSD u_int and u_short instead of the SYSV uint and
ushort.  In most of these files, there was a mixture of both styles and
this change just makes them self-consistent.

Requested by:	bde (kern_ktrace.c)
2003-08-07 15:04:27 +00:00
Hartmut Brandt
20e57b1045 Ups. I forgot this one in the SIOCATMENA/SIOCATMDIS removal commit.
This change allows one to specify almost the complete traffic parameters
for IPoverATM channels through the routing table. Up to now we used
4 byte DL addresses (flag, vpi, vciH, vciL). This format is still allowed.
If the address is longer, however, the 5th byte is interpreted as the
traffic class (UBR, CBR, VBR or ABR) and the remaining bytes are the
parameters for this traffic class:

  UBR: 0 byte or 3 byte PCR
  CBR: 3 byte PCR
  VBR: 3 byte PCR, 3 byte SCR, 3 byte MBS
  ABR: 3 byte PCR, 3 byte MCR, 3 byte ICR, 3 byte TBE, 1 byte NRM,
       1 byte TRM, 2 bytes ADTF, 1 byte RIF, 1 byte RDF and 1 byte CDF

A script to generate the corresponding 'route add' arguments will follow soon.
2003-08-06 15:56:37 +00:00
Jeffrey Hsu
1b6002ec30 * makes mfc[MFCTBLSIZ] and vif[MAXVIFS] tables accessible via
sysctl:
  - sysctlbyname("net.inet.ip.mfctable", ...)
  - sysctlbyname("net.inet.ip.viftable", ...)

  This change is needed so netstat can use sysctlbyname() to read
  the data from those tables.
  Otherwise, in some cases "netstat -g" may fail to report the
  multicast forwarding information (e.g., if we run a multicast
  router on PicoBSD).

* Bug fix: when sending IGMPMSG_WRONGVIF upcall to the multicast
  routing daemon, set properly "im->im_vif" to the receiving
  incoming interface of the packet that triggered that upcall
  rather than to the expected incoming interface of that packet.

* Bug fix: add missing increment of counter "mrtstat.mrts_upcalls"

* Few formatting nits (e.g., replace extra spaces with TABs)

Submitted by:	Pavlin Radoslavov <pavlin@icir.org>
2003-08-05 17:01:33 +00:00
Hartmut Brandt
7e3d4432af When adding a channel for INET failed at the device level (ioctl) the
code used to call rtrequest(RTM_DELETE, ...). This is a problem, because
the function that just has called us (route_output)
is not really happy with the route it just is creating beeing ripped out
from under it. Unfortunately we also cannot return an error from
ifa_rtrequest. Therefore mark the route just as RTF_REJECT.
2003-08-05 14:59:06 +00:00
Hartmut Brandt
5246b4ff88 Make this file to conform more to style(9) before really touching it. 2003-08-05 13:58:04 +00:00
Maxim Konovalov
e1bd2f381a o Fix a typo in previous commit. 2003-07-31 10:24:36 +00:00
Maxim Konovalov
853af3f3f0 o Do not overwrite saved interrupt priority level by alloc_hash(),
use a separate variable.
o Restore interrupt priority level before return (no-op in HEAD).

Spotted by:	Don Bowman <don@sandvine.com>
MFC after:	5 days
2003-07-25 09:59:16 +00:00
Sam Leffler
1f76a5e218 add IPSEC_FILTERGIF suport for FAST_IPSEC
PR:		kern/51922
Submitted by:	Eric Masson <e-masson@kisoft-services.com>
MFC after:	1 week
2003-07-22 18:58:34 +00:00
Mike Silbersack
7dc7f0311e Minor fix to the MBUF_STRESS_TEST code so that it keeps
pkthdr.len consistant at all times.  (Some debugging
code I'm working on is tripped otherwise.)

MFC after:	3 days
2003-07-19 05:50:32 +00:00
Robert Watson
83503a9227 Add a comment above rip_ctloutput() documenting that the privilege
check for raw IP system management operations is often (although
not always) implicit due to the namespacing of raw IP sockets.  I.e.,
you have to have privilege to get a raw IP socket, so much of the
management code sitting on raw IP sockets assumes that any requests
on the socket should be granted privilege.

Obtained from:	TrustedBSD Project
Product of:	France
2003-07-18 16:10:36 +00:00
Jeffrey Hsu
a12569ec4f Drop Giant around syncache timer processing. 2003-07-17 11:19:25 +00:00
Luigi Rizzo
4805529cf8 Allow set 31 to be used for rules other than 65535.
Set 31 is still special because rules belonging to it are not deleted
by the "ipfw flush" command, but must be deleted explicitly with
"ipfw delete set 31" or by individual rule numbers.

This implement a flexible form of "persistent rules" which you might
want to have available even after an "ipfw flush".
Note that this change does not violate POLA, because you could not
use set 31 in a ruleset before this change.

sbin/ipfw changes to allow manipulation of set 31 will follow shortly.

Suggested by: Paul Richards
2003-07-15 23:07:34 +00:00
Jeffrey Hsu
9d11646de7 Unify the "send high" and "recover" variables as specified in the
lastest rev of the spec.  Use an explicit flag for Fast Recovery. [1]

Fix bug with exiting Fast Recovery on a retransmit timeout
diagnosed by Lu Guohan. [2]

Reviewed by:		Thomas Henderson <thomas.r.henderson@boeing.com>
Reported and tested by:	Lu Guohan <lguohan00@mails.tsinghua.edu.cn> [2]
Approved by:		Thomas Henderson <thomas.r.henderson@boeing.com>,
			Sally Floyd <floyd@acm.org> [1]
2003-07-15 21:49:53 +00:00
Luigi Rizzo
72e02d4dac Implement comments embedded into ipfw2 instructions.
Since we already had 'O_NOP' instructions which always match, all
I needed to do is allow the NOP command to have arbitrary length
(i.e. move its label in a different part of the switch() which
validates instructions).

The kernel must know nothing about comments, everything else is
done in userland (which will be described in the upcoming ipfw2.c
commit).
2003-07-12 05:54:17 +00:00
Luigi Rizzo
7a1dfbc0d3 Merge the handlers of O_IP_SRC_MASK and O_IP_DST_MASK opcodes, and
support matching a list of addr/mask pairs so one can write
more efficient rulesets which were not possible before e.g.

    add 100 skipto 1000 not src-ip 10.0.0.0/8,127.0.0.1/8,192.168.0.0/16

The change is fully backward compatible.
ipfw2 and manpage commit to follow.

MFC after: 3 days
2003-07-08 07:44:42 +00:00
Luigi Rizzo
c3e5b9f154 Implement the 'ipsec' option to match packets coming out of an ipsec tunnel.
Should work with both regular and fast ipsec (mutually exclusive).
See manpage for more details.

Submitted by: Ari Suutari (ari.suutari@syncrontech.com)
Revised by: sam
MFC after: 1 week
2003-07-04 21:42:32 +00:00
Luigi Rizzo
f030c1518d Correct some comments, add opcode O_IPSEC to match packets
coming out of an ipsec tunnel.
2003-07-04 21:39:51 +00:00
Luigi Rizzo
5d3b4c2480 Remove a stale comment, fix indentation. 2003-06-28 14:23:22 +00:00
Luigi Rizzo
b5f3c4cff3 whitespace fix 2003-06-28 14:16:53 +00:00
Luigi Rizzo
9c1cfc8650 remove unused file (ipfw2 is the default in RELENG_5 and above; the old
ipfw1 has been unused and unmaintained for a long time).
2003-06-24 07:12:11 +00:00
Luigi Rizzo
ec4270c021 Fix typo in a (commented out) debugging string.
Spotted by: diff
2003-06-23 21:38:21 +00:00
Luigi Rizzo
67ab48d1ae Remove whitespace at end of line. 2003-06-23 21:18:56 +00:00
Luigi Rizzo
44c884e134 Add support for multiple values and ranges for the "iplen", "ipttl",
"ipid" options. This feature has been requested by several users.
On passing, fix some minor bugs in the parser.  This change is fully
backward compatible so if you have an old /sbin/ipfw and a new
kernel you are not in trouble (but you need to update /sbin/ipfw
if you want to use the new features).

Document the changes in the manpage.

Now you can write things like

	ipfw add skipto 1000 iplen 0-500

which some people were asking to give preferential treatment to
short packets.

The 'MFC after' is just set as a reminder, because I still need
to merge the Alpha/Sparc64 fixes for ipfw2 (which unfortunately
change the size of certain kernel structures; not that it matters
a lot since ipfw2 is entirely optional and not the default...)

PR: bin/48015

MFC after: 1 week
2003-06-22 17:33:19 +00:00
Mike Silbersack
fcaf9f9146 Map icmp time exceeded responses to EHOSTUNREACH rather than 0 (no error);
this makes connect act more sensibly in these cases.

PR:				50839
Submitted by:			Barney Wolff <barney@pit.databus.com>
Patch delayed by laziness of:	silby
MFC after:			1 week
2003-06-17 06:21:08 +00:00
Ruslan Ermilov
ada24e690c In the PKT_ALIAS_PROXY_ONLY mode, make sure to preserve the
original source IP address, as promised in the manual page.

Spotted by:	Vaclav Petricek
2003-06-13 21:54:01 +00:00
Ruslan Ermilov
9c88dc8855 Removed a couple of .Xo/.Xc that are leftovers of the "ninth-argument
limit" mdoc(7) atavism.
2003-06-13 21:39:22 +00:00
Ruslan Ermilov
7176089886 Clarify that original address and port when doing transparent proxying
are _destination_ address and port.
2003-06-13 21:36:24 +00:00
Ruslan Ermilov
61de149d30 Added myself to the AUTHORS section. 2003-06-13 21:32:01 +00:00
Philippe Charnier
9703a107f2 The .Fn function 2003-06-08 09:53:08 +00:00
Robert Watson
042bbfa3b5 When setting fragment queue pointers to NULL, or comparing them with
NULL, use NULL rather than 0 to improve readability.
2003-06-06 19:32:48 +00:00
Jeffrey Hsu
f058535deb Compensate for decreasing the minimum retransmit timeout.
Reviewed by:	jlemon
2003-06-04 10:03:55 +00:00
Bernd Walter
330462a315 Change handling to support strong alignment architectures such as alpha and
sparc64.

PR:		alpha/50658
Submitted by:	rizzo
Tested on:	alpha
2003-06-04 01:17:37 +00:00
Kelly Yancey
ed7ea0e1ab Account for packets processed at layer-2 (i.e. net.link.ether.ipfw=1).
MFC after:	2 weeks
2003-06-02 23:54:09 +00:00
Ruslan Ermilov
234dfc904a A new API function PacketAliasRedirectDynamic() can be used
to mark a fully specified static link as dynamic; i.e. make
it a one-time link.
2003-06-01 23:15:00 +00:00
Ruslan Ermilov
f1a529f3da Make the PacketAliasSetAddress() function call optional. If it
is not called, and no static rules match an outgoing packet, the
latter retains its source IP address.  This is in support of the
"static NAT only" mode.
2003-06-01 22:49:59 +00:00
Poul-Henning Kamp
4df05d61bd Remove unused variables.
Found by:       FlexeLint
2003-06-01 09:20:38 +00:00
Poul-Henning Kamp
e4d2978dd8 Add /* FALLTHROUGH */
Found by:       FlexeLint
2003-05-31 19:07:22 +00:00
Garrett Wollman
6e49b1fe55 Don't generate an ip_id for packets with the DF bit set; ip_id is
only meaningful for fragments.  Also don't bother to byte-swap the
ip_id when we do generate it; it is only used at the receiver as a
nonce.  I tried several different permutations of this code with no
measurable difference to each other or to the unmodified version, so
I've settled on the one for which gcc seems to generate the best code.
(If anyone cares to microoptimize this differently for an architecture
where it actually matters, feel free.)

Suggested by:	Steve Bellovin's paper in IMW'02
2003-05-31 17:55:21 +00:00
Robert Watson
430c635447 Correct a bug introduced with reduced TCP state handling; make
sure that the MAC label on TCP responses during TIMEWAIT is
properly set from either the socket (if available), or the mbuf
that it's responding to.

Unfortunately, this is made somewhat difficult by the TCP code,
as tcp_twstart() calls tcp_twrespond() after discarding the socket
but without a reference to the mbuf that causes the "response".
Passing both the socket and the mbuf works arounds this--eventually
it might be good to make sure the mbuf always gets passed in in
"response" scenarios but working through this provided to
complicate things too much.

Approved by:	re (scottl)
Reviewed by:	hsu
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-05-07 05:26:27 +00:00
Robert Watson
688fe1d954 Trim a call to mac_create_mbuf_from_mbuf() since m_tag meta-data
copying for mbuf headers now works properly in m_dup_pkthdr(), so
we don't need to do an explicit copy.

Approved by:	re (jhb)
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-05-06 20:34:04 +00:00
Matthew N. Dodd
e97c58c8cf Add definitions for IN6ADDR_LINKLOCAL_ALLMDNS_INIT and INADDR_ALLMDNS_GROUP. 2003-04-29 22:03:46 +00:00
Matthew N. Dodd
4957466b8e IP_RECVTTL socket option.
Reviewed by:	Stuart Cheshire <cheshire@apple.com>
2003-04-29 21:36:18 +00:00
Alexander Kabaev
104a9b7e3e Deprecate machine/limits.h in favor of new sys/limits.h.
Change all in-tree consumers to include <sys/limits.h>

Discussed on:	standards@
Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>
2003-04-29 13:36:06 +00:00
David E. O'Brien
152385d122 Explicitly declare 'int' parameters. 2003-04-21 16:27:46 +00:00
David E. O'Brien
bfd738788b style.Makefile(5) 2003-04-20 18:38:59 +00:00
Mike Silbersack
53dcc544a8 Rename MBUF_FRAG_TEST to MBUF_STRESS_TEST as it will be extended
to include more than just frag tests.
2003-04-12 06:11:46 +00:00
Robert Watson
cacd79e2c9 Remove a potential panic condition introduced by reduced TCP wait
state.  Those changed attempted to work around the changed invariant
that inp->in_socket was sometimes now NULL, but the logic wasn't
quite right, meaning that inp->in_socket would be dereferenced by
cr_canseesocket() if security.bsd.see_other_uids, jail, or MAC
were in use.  Attempt to clarify and correct the logic.

Note: the work-around originally introduced with the reduced TCP
wait state handling to use cr_cansee() instead of cr_canseesocket()
in this case isn't really right, although it "Does the right thing"
for most of the cases in the base system.  We'll need to address
this at some point in the future.

Pointed out by:	dcs
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-04-10 20:33:10 +00:00
Dag-Erling Smørgrav
fe58453891 Introduce an M_ASSERTPKTHDR() macro which performs the very common task
of asserting that an mbuf has a packet header.  Use it instead of hand-
rolled versions wherever applicable.

Submitted by:	Hiten Pandya <hiten@unixdaemons.com>
2003-04-08 14:25:47 +00:00
Dag-Erling Smørgrav
212059bd83 Replace memcpy() and ovbcopy() with bcopy(); ditch some caddr_t usage. 2003-04-04 12:14:00 +00:00
Matthew N. Dodd
2c56e246fa Back out support for RFC3514.
RFC3514 poses an unacceptale risk to compliant systems.
2003-04-02 20:14:44 +00:00
Matthew N. Dodd
4f6425f7ae - Use the correct constant define.
- Add a missing break.
2003-04-02 18:02:58 +00:00
Matthew N. Dodd
8faf6df9b3 Sync constant define with NetBSD.
Requested by:	 Tom Spindler <dogcow@babymeat.com>
2003-04-02 10:28:47 +00:00
Jeffrey Hsu
48d2549c3e Observe conservation of packets when entering Fast Recovery while
doing Limited Transmit.  Only artificially inflate the congestion
window by 1 segment instead of the usual 3 to take into account
the 2 already sent by Limited Transmit.

Approved in principle by:	Mark Allman <mallman@grc.nasa.gov>,
Hari Balakrishnan <hari@nms.lcs.mit.edu>, Sally Floyd <floyd@icir.org>
2003-04-01 21:16:46 +00:00
Matthew N. Dodd
09139a4537 Implement support for RFC 3514 (The Security Flag in the IPv4 Header).
(See: ftp://ftp.rfc-editor.org/in-notes/rfc3514.txt)

This fulfills the host requirements for userland support by
way of the setsockopt() IP_EVIL_INTENT message.

There are three sysctl tunables provided to govern system behavior.

	net.inet.ip.rfc3514:

		Enables support for rfc3514.  As this is an
		Informational RFC and support is not yet widespread
		this option is disabled by default.

	net.inet.ip.hear_no_evil

		 If set the host will discard all received evil packets.

	net.inet.ip.speak_no_evil

		If set the host will discard all transmitted evil packets.

The IP statistics counter 'ips_evil' (available via 'netstat') provides
information on the number of 'evil' packets recieved.

For reference, the '-E' option to 'ping' has been provided to demonstrate
and test the implementation.
2003-04-01 08:21:44 +00:00
Maxim Konovalov
7778283b40 Fix indentation. 2003-03-27 15:00:10 +00:00
Maxim Konovalov
be1e4c5162 o Protect set_fs_param() by splimp(9).
Quote from kern/37573:

	There is an obvious race in netinet/ip_dummynet.c:config_pipe().
	Interrupts are not blocked when changing the params of an
	existing pipe.  The specific crash observed:

	... -> config_pipe -> set_fs_parms -> config_red

	malloc a new w_q_lookup table but take an interrupt before
	intializing it, interrupt handler does:

	... -> dummynet_io -> red_drops

	red_drops dereferences the uninitialized (zeroed) w_q_lookup
	table.

o Flush accumulated credits for idle pipes.
o Flush accumulated credits when change pipe characteristics.
o Change dn_flow_queue.numbytes type to unsigned long.

	Overlapping dn_flow_queue->numbytes in ready_event() leads to
	numbytes becomes negative and SET_TICKS() macro returns a very
	big value.  heap_insert() overlaps dn_key again and inserts a
	queue to a ready heap with a sched_time points to the past.
	That leads to an "infinity" loop.

PR:		kern/33234, kern/37573, misc/42459, kern/43133,
		kern/44045, kern/48099
Submitted by:	Mike Hibler <mike@cs.utah.edu> (kern/37573)
MFC after:	6 weeks
2003-03-27 14:56:36 +00:00
Robert Watson
5e7ce4785f Modify the mac_init_ipq() MAC Framework entry point to accept an
additional flags argument to indicate blocking disposition, and
pass in M_NOWAIT from the IP reassembly code to indicate that
blocking is not OK when labeling a new IP fragment reassembly
queue.  This should eliminate some of the WITNESS warnings that
have started popping up since fine-grained IP stack locking
started going in; if memory allocation fails, the creation of
the fragment queue will be aborted.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-03-26 15:12:03 +00:00
Maxime Henrion
511e01e2d6 Try to make the MBUF_FRAG_TEST code work better.
- Don't try to fragment the packet if it's smaller than mbuf_frag_size.
- Preserve the size of the mbuf chain which is modified by m_split().
- Check that m_split() didn't return NULL.
- Make it so we don't end up with two M_PKTHDR mbuf in the chain.
- Use m->m_pkthdr.len instead of m->m_len so that we fragment the whole
  chain and not just the first mbuf.
- Fix a nearby style bug and rework the logic of the loops so that it's
  more clear.

This is still not quite right, because we're clearly abusing m_split() to
do something it was not designed for, but at least it works now.  We
should probably move this code into a m_fragment() function when it's
correct.
2003-03-25 23:49:14 +00:00
Mike Silbersack
9d9edc5693 Add the MBUF_FRAG_TEST option. When compiled in, this option
allows you to tell ip_output to fragment all outgoing packets
into mbuf fragments of size net.inet.ip.mbuf_frag_size bytes.
This is an excellent way to test if network drivers can properly
handle long mbuf chains being passed to them.

net.inet.ip.mbuf_frag_size defaults to 0 (no fragmentation)
so that you can at least boot before your network driver dies. :)
2003-03-25 05:45:05 +00:00
Maxime Henrion
aecfcdb824 Use __packed instead of __attribute__((__packed__)). 2003-03-22 00:25:14 +00:00
Matthew N. Dodd
57842a38fd Add a sysctl node allowing the specification of an address mask to use
when replying to ICMP Address Mask Request packets.
2003-03-21 15:43:06 +00:00
Matthew N. Dodd
21150298bb Add comments regarding the ICMP timestamp fields. 2003-03-21 15:28:10 +00:00
Crist J. Clark
010dabb047 Add a 'verrevpath' option that verifies the interface that a packet
comes in on is the same interface that we would route out of to get to
the packet's source address. Essentially automates an anti-spoofing
check using the information in the routing table.

Experimental. The usage and rule format for the feature may still be
subject to change.
2003-03-15 01:13:00 +00:00
Jeffrey Hsu
7792ea2700 Greatly simplify the unlocking logic by holding the TCP protocol lock until
after FIN_WAIT_2 processing.

Helped with debugging:	Doug Barton
2003-03-13 11:46:57 +00:00
Jeffrey Hsu
da3a8a1a4f Add support for RFC 3390, which allows for a variable-sized
initial congestion window.
2003-03-13 01:43:45 +00:00
Jeffrey Hsu
582a954b00 Implement the Limited Transmit algorithm (RFC 3042). 2003-03-12 20:27:28 +00:00
Sam Leffler
4a692a1fc2 correct two more flag misuses; m_tag* use malloc flags 2003-03-12 14:45:22 +00:00
Jonathan Lemon
a3b6edc353 Remove check for t_state == TCPS_TIME_WAIT and introduce the tw structure.
Sponsored by: DARPA, NAI Labs
2003-03-08 22:07:52 +00:00
Jonathan Lemon
607b0b0cc9 Remove a panic(); if the zone allocator can't provide more timewait
structures, reuse the oldest one.  Also move the expiry timer from
a per-structure callout to the tcp slow timer.

Sponsored by: DARPA, NAI Labs
2003-03-08 22:06:20 +00:00
Peter Wemm
3c6b084e96 Finish driving a stake through the heart of netns and the associated
ifdefs scattered around the place - its dead Jim!

The SMB stuff had stolen AF_NS, make it official.
2003-03-05 19:24:24 +00:00
Jonathan Lemon
1cafed3941 Update netisr handling; Each SWI now registers its queue, and all queue
drain routines are done by swi_net, which allows for better queue control
at some future point.  Packets may also be directly dispatched to a netisr
instead of queued, this may be of interest at some installations, but
currently defaults to off.

Reviewed by: hsu, silby, jayanth, sam
Sponsored by: DARPA, NAI Labs
2003-03-04 23:19:55 +00:00
Dag-Erling Smørgrav
521f364b80 More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9). 2003-03-02 16:54:40 +00:00
Jonathan Lemon
272c5dfe93 In timewait state, if the incoming segment is a pure in-sequence ack
that matches snd_max, then do not respond with an ack, just drop the
segment.  This fixes a problem where a simultaneous close results in
an ack loop between two time-wait states.

Test case supplied by: Tim Robbins <tjr@FreeBSD.ORG>
Sponsored by: DARPA, NAI Labs
2003-02-26 18:20:41 +00:00
Jonathan Lemon
ef6b48deb9 The TCP protocol lock may still be held if the reassembly queue dropped FIN.
Detect this case and drop the lock accordingly.

Sponsored by: DARPA, NAI Labs
2003-02-26 13:55:13 +00:00
Mike Silbersack
a75a485d62 Fix a condition so that ip reassembly queues are emptied immediately
when maxfragpackets is dropped to 0.

Noticed by:	bmah
2003-02-26 07:28:35 +00:00
Robert Watson
9327ee33bf When generating a TCP response to a connection, not only test if the
tcpcb is NULL, but also its connected inpcb, since we now allow
elements of a TCP connection to hang around after other state, such
as the socket, has been recycled.

Tested by:	dcs
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-02-25 14:08:41 +00:00
Maxim Konovalov
b36f5b3735 style(9): join lines. 2003-02-25 11:53:11 +00:00
Maxim Konovalov
99e8617d24 Ip reassembly queue structure has ipq_nfrags now. Count a number of
dropped ip fragments precisely.

Reviewed by:	silby
2003-02-25 11:49:01 +00:00
Jeffrey Hsu
edf02ff15d Hold the TCP protocol lock while modifying the connection hash table. 2003-02-25 01:32:03 +00:00
Mike Silbersack
af9c7d06d5 Fix a comment which didn't match the new cookie behavior.
Submitted by:	Scott Renfro <scott@renfro.org>
MFC after:	1 day
2003-02-24 03:15:48 +00:00
Jeffrey Hsu
11a20fb8b6 tcp_twstart() need to be called with the TCP protocol lock held to avoid
a race condition with the TCP timer routines.
2003-02-24 00:52:03 +00:00
Jeffrey Hsu
2fbef91887 Pass the right function to callout_reset() for a compressed
TIME-WAIT control block.
2003-02-24 00:48:12 +00:00
Mike Silbersack
a432399c56 Improve the security and performance of syncookies:
Security improvements:
- Increase the size of each syncookie secret from 32 to 128 bits
  in order to make brute force attacks on the secrets much more
  difficult.
- Always return the lowest order dword from the MD5 hash; this
  allows us to expose 2 more bits of the cookie and makes ACK
  floods which seek to guess the cookie value more difficult.

Performance improvements:
- Increase the lifetime of each syncookie from 4 seconds to 16
  seconds.  This increases the usefulness of syncookies during
  an attack.
- From Yahoo!: Reduce the number of calls to MD5Update; this
  results in a ~17% increase in cookie generation time here.

Reviewed by:	hsu, jayanth, jlemon, nectar
MFC After:	15 seconds
2003-02-23 19:04:23 +00:00
Jonathan Lemon
f243998be5 Yesterday just wasn't my day. Remove testing delta that crept into the diff.
Pointy hat provided by: sam
2003-02-23 15:40:36 +00:00
Sam Leffler
14dd6717f8 Add a new config option IPSEC_FILTERGIF to control whether or not
packets coming out of a GIF tunnel are re-processed by ipfw, et. al.
By default they are not reprocessed.  With the option they are.

This reverts 1.214.  Prior to that change packets were not re-processed.
After they were which caused problems because packets do not have
distinguishing characteristics (like a special network if) that allows
them to be filtered specially.

This is really a stopgap measure designed for immediate MFC so that
4.8 has consistent handling to what was in 4.7.

PR:		48159
Reviewed by:	Guido van Rooij <guido@gvr.org>
MFC after:	1 day
2003-02-23 00:47:06 +00:00
Jonathan Lemon
a14c749f04 Check to see if the TF_DELACK flag is set before returning from
tcp_input().  This unbreaks delack handling, while still preserving
correct T/TCP behavior

Tested by: maxim
Sponsored by: DARPA, NAI Labs
2003-02-22 21:54:57 +00:00
Mike Silbersack
375386e284 Add the ability to limit the number of IP fragments allowed per packet,
and enable it by default, with a limit of 16.

At the same time, tweak maxfragpackets downward so that in the worst
possible case, IP reassembly can use only 1/2 of all mbuf clusters.

MFC after: 	3 days
Reviewed by:	hsu
Liked by:	bmah
2003-02-22 06:41:47 +00:00
Poul-Henning Kamp
d25ecb917b - m = m_gethdr(M_NOWAIT, MT_HEADER);
+       m = m_gethdr(M_DONTWAIT, MT_HEADER);

'nuff said.
2003-02-21 23:17:12 +00:00
Crist J. Clark
b0d226932e The ancient and outdated concept of "privileged ports" in UNIX-type
OSes has probably caused more problems than it ever solved. Allow the
user to retire the old behavior by specifying their own privileged
range with,

  net.inet.ip.portrange.reservedhigh  default = IPPORT_RESERVED - 1
  net.inet.ip.portrange.reservedlo    default = 0

Now you can run that webserver without ever needing root at all. Or
just imagine, an ftpd that can really drop privileges, rather than
just set the euid, and still do PORT data transfers from 20/tcp.

Two edge cases to note,

  # sysctl net.inet.ip.portrange.reservedhigh=0

Opens all ports to everyone, and,

  # sysctl net.inet.ip.portrange.reservedhigh=65535

Locks all network activity to root only (which could actually have
been achieved before with ipfw(8), but is somewhat more
complicated).

For those who stick to the old religion that 0-1023 belong to root and
root alone, don't touch the knobs (or even lock them by raising
securelevel(8)), and nothing changes.
2003-02-21 05:28:27 +00:00
Jonathan Lemon
8608c4c1f9 Remove unused variables in the IPSEC case.
Submitted by:  Lars Eggert <larse@ISI.EDU>
2003-02-20 18:22:21 +00:00
Jonathan Lemon
ffae8c5a7e Unbreak non-IPV6 compilation.
Caught by: phk
Sponsored by: DARPA, NAI Labs
2003-02-19 23:43:04 +00:00
Jonathan Lemon
340c35de6a Add a TCP TIMEWAIT state which uses less space than a fullblown TCP
control block.  Allow the socket and tcpcb structures to be freed
earlier than inpcb.  Update code to understand an inp w/o a socket.

Reviewed by: hsu, silby, jayanth
Sponsored by: DARPA, NAI Labs
2003-02-19 22:32:43 +00:00
Jonathan Lemon
7990938421 Convert tcp_fillheaders(tp, ...) -> tcpip_fillheaders(inp, ...) so the
routine does not require a tcpcb to operate.  Since we no longer keep
template mbufs around, move pseudo checksum out of this routine, and
merge it with the length update.

Sponsored by: DARPA, NAI Labs
2003-02-19 22:18:06 +00:00
Jonathan Lemon
414462252a Correct comments. 2003-02-19 21:33:46 +00:00
Jonathan Lemon
3bfd6421c2 Clean up delayed acks and T/TCP interactions:
- delay acks for T/TCP regardless of delack setting
   - fix bug where a single pass through tcp_input might not delay acks
   - use callout_active() instead of callout_pending()

Sponsored by: DARPA, NAI Labs
2003-02-19 21:18:23 +00:00
Warner Losh
a163d034fa Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
Maxim Konovalov
b52d5ea3d2 o Fix ipfw uid rules: socheckuid() returns 0 when uid matches a socket
cr_uid.

Note: we do not have socheckuid() in RELENG_4, ip_fw2.c uses its
own macro for a similar purpose that is why ipfw2 in RELENG_4 processes
uid rules correctly. I will MFC the diff for code consistency.

Reported by:	Oleg Baranov <ol@csa.ru>
Reviewed by:	luigi
MFC after:	1 month
2003-02-17 13:39:57 +00:00
Jeffrey Hsu
4b40c56c28 Take advantage of pre-existing lock-free synchronization and type stable memory
to avoid acquiring SMP locks during expensive copyout process.
2003-02-15 02:37:57 +00:00
Jeffrey Hsu
85e8b24343 The protocol lock is always held in the dropafterack case, so we don't
need to check for it at runtime.
2003-02-13 22:14:22 +00:00
Jeffrey Hsu
3dc7ebf9ff in_pcbnotifyall() requires an exclusive protocol lock for notify functions
which modify the connection list, namely, tcp_notify().
2003-02-12 23:55:07 +00:00
Jeffrey Hsu
6d45d64a8f Properly document that syncache timer processing requires an
exclusive TCP protocol lock.
2003-02-12 00:42:12 +00:00
Seigo Tanimura
cd6c2a8874 s/IPSSEC/IPSEC/ 2003-02-11 10:51:56 +00:00
Jeffrey Hsu
24652ff6e1 Get cosmetic changes out of the way before I add routing table SMP locks. 2003-02-10 22:01:34 +00:00
Orion Hodson
022695f82a Avoid multiply for preemptive arp calculation since it hits every
ethernet packet sent.

Prompted by: Jeffrey Hsu <hsu@FreeBSD.org>
2003-02-08 15:05:15 +00:00
Orion Hodson
73224fb019 MFS 1.64.2.22: Re-enable non pre-emptive ARP requests.
Submitted by: "Diomidis Spinellis" <dds@aueb.gr>
PR:           kern/46116
2003-02-04 05:28:08 +00:00
Crist J. Clark
39eb27a4a9 Add the TCP flags to the log message whenever log_in_vain is 1, not
just when set to 2.

PR:		kern/43348
MFC after:	5 days
2003-02-02 22:06:56 +00:00
Mike Silbersack
ecf44c01f4 Move a comment and optimize the frag timeout code a slight bit.
Submitted by:	maxim
MFC with:	The previous two revisions
2003-02-01 05:59:51 +00:00
Sam Leffler
9359ad861e FAST_IPSEC bandaid: act like KAME and ignore ENOENT error codes from
ipsec4_process_packet; they happen when a packet is dropped because
an SA acquire is initiated

Submitted by:	Doug Ambrisko <ambrisko@verniernetworks.com>
2003-01-30 05:45:45 +00:00
Sam Leffler
28a34902c4 remove the restriction on build a kernel with FAST_IPSEC and INET6;
you still don't want to use the two together, but it's ok to have
them in the same kernel (the problem that initiated this bandaid
has long since been fixed)
2003-01-30 05:43:08 +00:00
Mike Silbersack
d4d5315c23 Fix a bug with syncookies; previously, the syncache's MSS size was not
initialized until after a syncookie was generated.  As a result,
all connections resulting from a returned cookie would end up using
a MSS of ~512 bytes.  Now larger packets will be used where possible.

MFC after:	5 days
2003-01-29 03:49:49 +00:00
Poul-Henning Kamp
4ee6e70ef3 Check bounds for index before dereferencing memory past end of array.
Found by:	FlexeLint
2003-01-28 22:44:12 +00:00
Jeffrey Hsu
93f798891a Avoid lock order reversal by expanding the scope of the
AF_INET radix tree lock to cover the ARP data structures.
2003-01-28 20:22:19 +00:00
Mike Silbersack
ac64c8668b A few fixes to rev 1.221
- Honor the previous behavior of maxfragpackets = 0 or -1
- Take a better stab at fragment statistics
- Move / correct a comment

Suggested by:	maxim@
MFC after:	7 days
2003-01-28 03:39:39 +00:00
Mike Silbersack
402062e80c Merge the best parts of maxfragpackets and maxnipq together. (Both
functions implemented approximately the same limits on fragment memory
usage, but in different fashions.)

End user visible changes:
- Fragment reassembly queues are freed in a FIFO manner when maxfragpackets
  has been reached, rather than all reassembly stopping.

MFC after: 	5 days
2003-01-26 01:44:05 +00:00
Alfred Perlstein
44956c9863 Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
Maxim Konovalov
2adf7582da De-anonymity a couple of messages I missed in a previous sweep.
Move one of them under DEB macro.

Noticed by:	Wiktor Niesiobedzki <w@evip.pl>
2003-01-20 13:03:34 +00:00
Maxim Konovalov
8ec22a9363 If the first action is O_LOG adjust a pointer to the real one, unbreaks
skipto + log rules.

Reported by:	Wiktor Niesiobedzki <w@evip.pl>
MFC after:	1 week
2003-01-20 11:58:34 +00:00
Jeffrey Hsu
314e5a3daf Optimize away call to bzero() in the common case by directly checking
if a connection has any cached TAO information.
2003-01-18 19:03:26 +00:00
Jeffrey Hsu
f5c5746047 Fix long-standing bug predating FreeBSD where calling connect() twice
on a raw ip socket will crash the system with a null-dereference.
2003-01-18 01:10:55 +00:00
Jeffrey Hsu
c996428c32 SMP locking for ARP. 2003-01-17 07:59:35 +00:00
Matthew Dillon
fe41ca530c Introduce the ability to flag a sysctl for operation at secure level 2 or 3
in addition to secure level 1.  The mask supports up to a secure level of 8
but only add defines through CTLFLAG_SECURE3 for now.

As per the missif in the log entry for 1.11 of ip_fw2.c which added the
secure flag to the IPFW sysctl's in the first place, change the secure
level requirement from 1 to 3 now that we have support for it.

Reviewed by:	imp
With Design Suggestions by:	imp
2003-01-14 19:35:33 +00:00
Jeffrey Hsu
cb942153c8 Fix NewReno.
Reviewed by: Tom Henderson <thomas.r.henderson@boeing.com>
2003-01-13 11:01:20 +00:00
Thomas Moestl
a9a7a91220 Clear the target hardware address field when generating an ARP request.
Reviewed by:	nectar
MFC after:	1 week
2003-01-10 00:04:53 +00:00
Jeffrey Hsu
b21bf9a59b Validate inp before de-referencing it.
Submitted by:	pb
2003-01-05 07:56:24 +00:00
Jens Schweikhardt
9d5abbddbf Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup,
especially in troff files.
2003-01-01 18:49:04 +00:00
Sam Leffler
9967cafc49 Correct mbuf packet header propagation. Previously, packet headers
were sometimes propagated using M_COPY_PKTHDR which actually did
something between a "move" and a  "copy" operation.  This is replaced
by M_MOVE_PKTHDR (which copies the pkthdr contents and "removes" it
from the source mbuf) and m_dup_pkthdr which copies the packet
header contents including any m_tag chain.  This corrects numerous
problems whereby mbuf tags could be lost during packet manipulations.

These changes also introduce arguments to m_tag_copy and m_tag_copy_chain
to specify if the tag copy work should potentially block.  This
introduces an incompatibility with openbsd which we may want to revisit.

Note that move/dup of packet headers does not handle target mbufs
that have a cluster bound to them.  We may want to support this;
for now we watch for it with an assert.

Finally, M_COPYFLAGS was updated to include M_FIRSTFRAG|M_LASTFRAG.

Supported by:	Vernier Networks
Reviewed by:	Robert Watson <rwatson@FreeBSD.org>
2002-12-30 20:22:40 +00:00
Matthew Dillon
07fd333df3 Remove the PAWS ack-on-ack debugging printf().
Note that the original RFC 1323 (PAWS) says in 4.2.1 that the out of
order / reverse-time-indexed packet should be acknowledged as specified
in RFC-793 page 69 then dropped.  The original PAWS code in FreeBSD (1994)
simply acknowledged the segment unconditionally, which is incorrect, and
was fixed in 1.183 (2002).  At the moment we do not do checks for SYN or FIN
in addition to (tlen != 0), which may or may not be correct, but the
worst that ought to happen should be a retry by the sender.
2002-12-30 19:31:04 +00:00
Sam Leffler
069f35d328 correct style bogons 2002-12-30 18:45:31 +00:00
Ian Dowse
ed1a13b18f Bridged packets are supplied to the firewall with their IP header
in network byte order, but icmp_error() expects the IP header to
be in host order and the code here did not perform the necessary
swapping for the bridged case. This bug causes an "icmp_error: bad
length" panic when certain length IP packets (e.g. ip_len == 0x100)
are rejected by the firewall with an ICMP response.

MFC after:	3 days
2002-12-27 17:43:25 +00:00
Jeffrey Hsu
abe239cfe2 Validate inp to prevent an use after free. 2002-12-24 21:00:31 +00:00
Maxim Konovalov
f4ef616f98 o De-anonymity dummynet(4) and ipfw(4) messages, prepend them
by 'dummynet: ' and 'ipfw: ' prefixes.

PR:		kern/41609
2002-12-24 13:45:24 +00:00
Jeffrey Hsu
956b0b653c SMP locking for radix nodes. 2002-12-24 03:03:39 +00:00
Pierre Beyssac
1ba7727b9e Remove forgotten INP_UNLOCK(inp) in my previous commit.
Reported by: hsu
2002-12-22 13:04:08 +00:00
Pierre Beyssac
87cd4001b5 In syncache_timer(), don't attempt to lock the inpcb structure
associated with the syncache entry: in case tcp_close() has been
called on the corresponding listening socket, the lock has been
destroyed as a side effect of in_pcbdetach(), causing a panic when
we attempt to lock on it.

Reviewed by:	hsu
2002-12-21 19:59:47 +00:00
Sam Leffler
00f21882a0 replace the special-purpose rate-limiting code with the general facility
just added; this tries to maintain the same behaviour vis a vis printing
the rate-limiting messages but need tweaking
2002-12-21 00:08:20 +00:00
Jeffrey Hsu
9a39fc9d73 Eliminate a goto.
Fix some line breaks.
2002-12-20 11:24:02 +00:00
Jeffrey Hsu
540e8b7e31 Unravel a nested conditional.
Remove an unneeded local variable.
2002-12-20 11:16:52 +00:00
Jeffrey Hsu
f320a1bfd2 Expand scope of TCP protocol lock to cover syncache data structures. 2002-12-20 00:24:19 +00:00
Bosko Milekic
86fea6be59 o Untangle the confusion with the malloc flags {M_WAITOK, M_NOWAIT} and
the mbuf allocator flags {M_TRYWAIT, M_DONTWAIT}.
o Fix a bpf_compat issue where malloc() was defined to just call
  bpf_alloc() and pass the 'canwait' flag(s) along.  It's been changed
  to call bpf_alloc() but pass the corresponding M_TRYWAIT or M_DONTWAIT
  flag (and only one of those two).

Submitted by: Hiten Pandya <hiten@unixdaemons.com> (hiten->commit_count++)
2002-12-19 22:58:27 +00:00
Jeffrey Hsu
19fc74fb60 Lock up ifaddr reference counts. 2002-12-18 11:46:59 +00:00
Poul-Henning Kamp
11aee0b4b0 Remove unused and incorrectly maintained variable "in_interfaces" 2002-12-17 19:30:04 +00:00
Matthew Dillon
967adce8df Fix syntax in last commit. 2002-12-17 00:24:48 +00:00
Maxim Konovalov
616fa7460c o Trim EOL whitespaces.
MFC after:	1 week
2002-12-15 10:24:36 +00:00
Maxim Konovalov
21ef23ab3f o s/if_name[16]/if_name[IFNAMSIZ]/
Reviewed by:	luigi
MFC after:	1 week
2002-12-15 10:23:02 +00:00
Maxim Konovalov
2713a5bebb o M_DONTWAIT is mbuf(9) flag: malloc(M_DONTWAIT) -> malloc(M_NOWAIT).
The bug does not affect anything because M_NOWAIT == M_DONTWAIT.

Reviewed by:	luigi
MFC after:	1 week
2002-12-15 10:21:30 +00:00
Maxim Konovalov
83b75b7621 o Fix byte order logging issue: sa.sin_port is already in host byte order.
PR:		kern/45964
Submitted by:	Sascha Blank <sblank@tiscali.de>
Reviewed by:	luigi
MFC after:	1 week
2002-12-15 09:44:02 +00:00
Matthew Dillon
d7ff8ef62a Change tcp.inflight_min from 1024 to a production default of 6144. Create
a sysctl for the stabilization value for the bandwidth delay product (inflight)
algorithm and document it.

MFC after:	3 days
2002-12-14 21:00:17 +00:00
Matthew Dillon
1ab4789dc2 Bruce forwarded this tidbit from an analysis Van Jacobson did on an
apparent ack-on-ack problem with FreeBSD.  Prof. Jacobson noticed a
case in our TCP stack which would acknowledge a received ack-only packet,
which is not legal in TCP.

Submitted by:	 Van Jacobson <van@packetdesign.com>,
		bmah@packetdesign.com (Bruce A. Mah)
MFC after:	7 days
2002-12-14 07:31:51 +00:00
Maxim Sobolev
16199bf2d3 MFS: recognize gre packets used in the WCCP protocol.
Approved by:	re
2002-12-07 14:22:05 +00:00
Luigi Rizzo
97850a5dd9 Move fw_one_pass from ip_fw2.c to ip_input.c so that neither
bridge.c nor if_ethersubr.c depend on IPFIREWALL.
Restore the use of fw_one_pass in if_ethersubr.c

ipfw.8 will be updated with a separate commit.

Approved by: re
2002-11-20 19:07:27 +00:00
Luigi Rizzo
032dcc7680 Back out some style changes. They are not urgent,
I will put them back in after 5.0 is out.

Requested by: sam
Approved by: re
2002-11-20 19:00:54 +00:00
Luigi Rizzo
b375c9ec2c Back out the ip_fragment() code -- it is not urgent to have it in now,
I will put it back in in a better form after 5.0 is out.

Requested by: sam, rwatson, luigi (on second thought)
Approved by: re
2002-11-20 18:56:25 +00:00
Mike Silbersack
df285b3d1d Add a sysctl to control the generation of source quench packets,
and set it to 0 by default.

Partially obtained from:	NetBSD
Suggested by:	David Gilbert
MFC after:	5 days
2002-11-19 17:06:06 +00:00
Luigi Rizzo
9b77fbf0a2 Fix function headers and remove 'register' variable declarations. 2002-11-17 17:04:19 +00:00
Luigi Rizzo
3e372e140c Move the ip_fragment code from ip_output() to a separate function,
so that it can be reused elsewhere (there is a number of places
where it can be useful). This also trims some 200 lines from
the body of ip_output(), which helps readability a bit.

(This change was discussed a few weeks ago on the mailing lists,
Julian agreed, silence from others. It is not a functional change,
so i expect it to be ok to commit it now but i am happy to back it
out if there are objections).

While at it, fix some function headers and replace m_copy() with
m_copypacket() where applicable.

MFC after: 1 week
2002-11-17 16:30:44 +00:00
Luigi Rizzo
20fab86349 Minor documentation changes and indentation fix.
Replace m_copy() with m_copypacket() where applicable.

While at it, fix some function headers and remove 'register' from
variable declarations.
2002-11-17 16:13:08 +00:00
Luigi Rizzo
4e8fe3210d Cleanup some of the comments, and reformat long lines.
Replace m_copy() with m_copypacket() where applicable.

Replace "if (a.s_addr ...)" with "if (a.s_addr != INADDR_ANY ...)"
to make it clear what the code means.

While at it, fix some function headers and remove 'register' from
variable declarations.

MFC after: 3 days
2002-11-17 16:02:17 +00:00
Luigi Rizzo
bbb4330b61 Massive cleanup of the ip_mroute code.
No functional changes, but:

  + the mrouting module now should behave the same as the compiled-in
    version (it did not before, some of the rsvp code was not loaded
    properly);
  + netinet/ip_mroute.c is now truly optional;
  + removed some redundant/unused code;
  + changed many instances of '0' to NULL and INADDR_ANY as appropriate;
  + removed several static variables to make the code more SMP-friendly;
  + fixed some minor bugs in the mrouting code (mostly, incorrect return
    values from functions).

This commit is also a prerequisite to the addition of support for PIM,
which i would like to put in before DP2 (it does not change any of
the existing APIs, anyways).

Note, in the process we found out that some device drivers fail to
properly handle changes in IFF_ALLMULTI, leading to interesting
behaviour when a multicast router is started. This bug is not
corrected by this commit, and will be fixed with a separate commit.

Detailed changes:
--------------------
netinet/ip_mroute.c     all the above.
conf/files              make ip_mroute.c optional
net/route.c             fix mrt_ioctl hook
netinet/ip_input.c      fix ip_mforward hook, move rsvp_input() here
                        together with other rsvp code, and a couple
                        of indentation fixes.
netinet/ip_output.c     fix ip_mforward and ip_mcast_src hooks
netinet/ip_var.h        rsvp function hooks
netinet/raw_ip.c        hooks for mrouting and rsvp functions, plus
                        interface cleanup.
netinet/ip_mroute.h     remove an unused and optional field from a struct

Most of the code is from Pavlin Radoslavov and the XORP project

Reviewed by: sam
MFC after: 1 week
2002-11-15 22:53:53 +00:00
Sam Leffler
eec3a0b17f track changes to not strip the Ethernet header from input packets
Reviewed by:	many
Approved by:	re
2002-11-14 23:46:04 +00:00
Sam Leffler
ccb2acfe1b track bpf changes
Reviewed by:	many
Approved by:	re
2002-11-14 23:45:13 +00:00
Maxim Konovalov
8ef1565d2b Due to a memory alignment sizeof(struct ipfw_flow_id) is bigger than
ipfw_flow_id structure actual size and bcmp(3) may fail to compare
them properly. Compare members of these structures instead.

PR:		kern/44078
Submitted by:	Oleg Bulyzhin <oleg@rinet.ru>
Reviewed by:	luigi
MFC after:	2 weeks
2002-11-13 11:31:44 +00:00
Jeffrey Hsu
e1e1b6e892 Turn off duplicate lock checking for inp locks because udp_input()
intentionally locks two inp records simultaneously.
2002-11-12 20:44:38 +00:00
Sam Leffler
6f0d017cf4 a better solution to building FAST_IPSEC w/o INET6
Submitted by:	Jeffrey Hsu <hsu@FreeBSD.org>
2002-11-10 17:17:32 +00:00
Alfred Perlstein
29f194457c Fix instances of macros with improperly parenthasized arguments.
Verified by: md5
2002-11-09 12:55:07 +00:00
Sam Leffler
9c0a8ace11 temporarily disallow FAST_IPSEC and INET6 to avoid potential panics;
will correct this before 5.0 release
2002-11-08 23:50:32 +00:00
Sam Leffler
e8539d32f0 FAST_IPSEC fixups:
o fix #ifdef typo
o must use "bounce functions" when dispatched from the protosw table

don't know how this stuff was missed in my testing; must've committed
the wrong bits

Pointy hat:	sam
Submitted by:	"Doug Ambrisko" <ambrisko@verniernetworks.com>
2002-11-08 23:37:50 +00:00
Sam Leffler
58fcadfc0f fixup FAST_IPSEC build w/o INET6 2002-11-08 23:33:59 +00:00
Sam Leffler
ab94ca3cec correct fast ipsec logic: compare destination ip address against the
contents of the SA, not the SP

Submitted by:	"Doug Ambrisko" <ambrisko@verniernetworks.com>
2002-11-08 23:11:02 +00:00
John Baldwin
2d4e26522d Cast a ptrdiff_t to an int to printf. 2002-11-08 14:52:26 +00:00
Jeff Roberson
1645d0903e - Consistently update snd_wl1, snd_wl2, and rcv_up in the header
prediction code.  Previously, 2GB worth of header predicted data
   could leave these variables too far out of sequence which would cause
   problems after receiving a packet that did not match the header
   prediction.

Submitted by:	Bill Baumann <bbaumann@isilon.com>
Sponsored by:	Isilon Systems, Inc.
Reviewed by:	hsu, pete@isilon.com, neal@isilon.com, aaronp@isilon.com
2002-10-31 23:24:13 +00:00
Jeffrey Hsu
30613f5610 Don't need to check if SO_OOBINLINE is defined.
Don't need to protect isipv6 conditional with INET6.
Fix leading indentation in 2 lines.
2002-10-30 08:32:19 +00:00
Bill Fenner
4d3ffc9841 Renumber IPPROTO_DIVERT out of the range of valid IP protocol numbers.
This allows socket() to return an error when the kernel is not built
with IPDIVERT, and doesn't prevent future applications from using the
"borrowed" IP protocol number.  The sysctl net.inet.raw.olddiverterror
controls whether opening a socket with the "borrowed" IP protocol
fails with an accompanying kernel printf; this code should last only a
couple of releases.

Approved by:	re
2002-10-29 16:46:13 +00:00
Maxim Konovalov
a98d88ad3e Lower a priority of "session drop" messages.
Requested by:	Eugene Grosbein <eugen@kuzbass.ru>
MFC after:	3 days
2002-10-29 08:53:14 +00:00
Maxime Henrion
d28e8b3a0d Oops, forgot to commit this file. This is part of the fix
for ipfw2 panics on sparc64.
2002-10-24 22:32:13 +00:00
Maxime Henrion
7c697970f4 Fix ipfw2 panics on 64-bit platforms.
Quoting luigi:

In order to make the userland code fully 64-bit clean it may
be necessary to commit other changes that may or may not cause
a minor change in the ABI.

Reviewed by:	luigi
2002-10-24 18:04:44 +00:00
Luigi Rizzo
18f13da2be src and dst address were erroneously swapped in SRC_SET and DST_SET
commands.  Use the correct one. Also affects ipfw2 in -stable.
2002-10-24 18:01:53 +00:00
Maxime Henrion
56e77afa59 Fix kernel build on sparc64 in the IPDIVERT case. 2002-10-24 09:58:50 +00:00
Ian Dowse
efac726eeb Unbreak the automatic remapping of an INADDR_ANY destination address
to the primary local IP address when doing a TCP connect(). The
tcp_connect() code was relying on in_pcbconnect (actually in_pcbladdr)
modifying the passed-in sockaddr, and I failed to notice this in
the recent change that added in_pcbconnect_setup(). As a result,
tcp_connect() was ending up using the unmodified sockaddr address
instead of the munged version.

There are two cases to handle: if in_pcbconnect_setup() succeeds,
then the PCB has already been updated with the correct destination
address as we pass it pointers to inp_faddr and inp_fport directly.
If in_pcbconnect_setup() fails due to an existing but dead connection,
then copy the destination address from the old connection.
2002-10-24 02:02:34 +00:00
Maxim Konovalov
ba3a9d459c Kill EOL spaces.
Approved by:	luigi
MFC after:	1 week
2002-10-23 10:07:55 +00:00
Maxim Konovalov
6b6874b20c Use syslog for messages about dropped sessions, do not flood a console.
Suggested by:	Eugene Grosbein <eugen@kuzbass.ru>
Approved by:	luigi
MFC after:	1 week
2002-10-23 10:05:19 +00:00
SUZUKI Shinsuke
2754d95d85 fixed a kernel crash by "ifconfig stf0 inet 1.2.3.4"
MFC after:	1 week
2002-10-22 22:50:38 +00:00
Ian Dowse
c557ae16ce Implement a new IP_SENDSRCADDR ancillary message type that permits
a server process bound to a wildcard UDP socket to select the IP
address from which outgoing packets are sent on a per-datagram
basis. When combined with IP_RECVDSTADDR, such a server process can
guarantee to reply to an incoming request using the same source IP
address as the destination IP address of the request, without having
to open one socket per server IP address.

Discussed on:	-net
Approved by:	re
2002-10-21 20:40:02 +00:00
Ian Dowse
90162a4e87 Remove the "temporary connection" hack in udp_output(). In order
to send datagrams from an unconnected socket, we used to first block
input, then connect the socket to the sendmsg/sendto destination,
send the datagram, and finally disconnect the socket and unblock
input.

We now use in_pcbconnect_setup() to check if a connect() would have
succeeded, but we never record the connection in the PCB (local
anonymous port allocation is still recorded, though). The result
from in_pcbconnect_setup() authorises the sending of the datagram
and selects the local address and port to use, so we just construct
the header and call ip_output().

Discussed on:	-net
Approved by:	re
2002-10-21 20:10:05 +00:00
Ian Dowse
5200e00e72 Replace in_pcbladdr() with a more generic inner subroutine for
in_pcbconnect() called in_pcbconnect_setup(). This version performs
all of the functions of in_pcbconnect() except for the final
committing of changes to the PCB. In the case of an EADDRINUSE error
it can also provide to the caller the PCB of the duplicate connection,
avoiding an extra in_pcblookup_hash() lookup in tcp_connect().

This change will allow the "temporary connect" hack in udp_output()
to be removed and is part of the preparation for adding the
IP_SENDSRCADDR control message.

Discussed on:	-net
Approved by:	re
2002-10-21 13:55:50 +00:00
Poul-Henning Kamp
53be11f680 Fix two instances of variant struct definitions in sys/netinet:
Remove the never completed _IP_VHL version, it has not caught on
anywhere and it would make us incompatible with other BSD netstacks
to retain this version.

Add a CTASSERT protecting sizeof(struct ip) == 20.

Don't let the size of struct ipq depend on the IPDIVERT option.

This is a functional no-op commit.

Approved by:	re
2002-10-20 22:52:07 +00:00
Robert Watson
c740509854 When a packet is multicast encapsulated, give labeled policies the
opportunity to preserve the label.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-20 21:59:00 +00:00
Ian Dowse
4b932371f4 Split out most of the logic from in_pcbbind() into a new function
called in_pcbbind_setup() that does everything except commit the
changes to the PCB. There should be no functional change here, but
in_pcbbind_setup() will be used by the soon-to-appear IP_SENDSRCADDR
control message implementation to check or allocate the source
address and port.

Discussed on:	-net
Approved by:	re
2002-10-20 21:44:31 +00:00
Maxime Henrion
d7f4d27a7a Several malloc() calls were passing the M_DONTWAIT flag
which is an mbuf allocation flag.  Use the correct
M_NOWAIT malloc() flag.  Fortunately, both were defined
to 1, so this commit is a no-op.
2002-10-19 11:31:50 +00:00
Hajimu UMEMOTO
b6e2845324 last arg of in6?_gif_output() is not used any more.
Obtained from:	KAME
MFC after:	3 weeks
2002-10-17 17:47:55 +00:00
Alfred Perlstein
dde2897f82 de-__P(). 2002-10-16 22:27:27 +00:00
Hajimu UMEMOTO
ab94625826 use encapcheck.
Obtained from:	KAME
MFC after:	3 weeks
2002-10-16 20:16:49 +00:00
Hajimu UMEMOTO
9426aedf7f - after gif_set_tunnel(), psrc/pdst may be null. set IFF_RUNNING accordingly.
- set IFF_UP on SIOCSIFADDR.  be consistent with others.
- set if_addrlen explicitly (just in case)
- multi destination mode is long gone.
- missing break statement
- add gif_set_tunnel(), so that we can set tunnel address from within the
  kernel at ease.
- encap_attach/detach dynamically on ioctls
- move encap_attach() to dedicated function in in*_gif.c

Obtained from:	KAME
MFC after:	3 weeks
2002-10-16 19:49:37 +00:00
Matthew Dillon
abac41a659 Fix oops in my last commit, I was calculating a new length but then not
using it.  (The code is already correct in -stable).

Found by: silby
2002-10-16 19:16:33 +00:00
Guido van Rooij
2f591ab8fe Get rid of checking for ip sec history. It is true that packets are not
supposed to be checked by the firewall rules twice. However, because the
various ipsec handlers never call ip_input(), this never happens anyway.

This fixes the situation where a gif tunnel is encrypted with IPsec. In
such a case, after IPsec processing, the unencrypted contents from the
GIF tunnel are fed back to the ipintrq and subsequently handeld by
ip_input(). Yet, since there still is IPSec history attached, the
packets coming out from the gif device are never fed into the filtering
code.
This fix was sent to Itojun, and he pointed towartds
    http://www.netbsd.org/Documentation/network/ipsec/#ipf-interaction.
This patch actually implements what is stated there (specifically:
Packet came from tunnel devices (gif(4) and ipip(4)) will still
go through ipf(4). You may need to identify these packets by
using interface name directive in ipf.conf(5).

Reviewed by:	rwatson
MFC after:	3 weeks
2002-10-16 09:01:48 +00:00
Sam Leffler
9b65723081 correct PCB locking in broadcast/multicast case that was exposed by change
to use udp_append

Reviewed by:	hsu
2002-10-16 02:33:28 +00:00
Sam Leffler
b9234fafa0 Tie new "Fast IPsec" code into the build. This involves the usual
configuration stuff as well as conditional code in the IPv4 and IPv6
areas.  Everything is conditional on FAST_IPSEC which is mutually
exclusive with IPSEC (KAME IPsec implmentation).

As noted previously, don't use FAST_IPSEC with INET6 at the moment.

Reviewed by:	KAME, rwatson
Approved by:	silence
Supported by:	Vernier Networks
2002-10-16 02:25:05 +00:00
Sam Leffler
5d84645305 Replace aux mbufs with packet tags:
o instead of a list of mbufs use a list of m_tag structures a la openbsd
o for netgraph et. al. extend the stock openbsd m_tag to include a 32-bit
  ABI/module number cookie
o for openbsd compatibility define a well-known cookie MTAG_ABI_COMPAT and
  use this in defining openbsd-compatible m_tag_find and m_tag_get routines
o rewrite KAME use of aux mbufs in terms of packet tags
o eliminate the most heavily used aux mbufs by adding an additional struct
  inpcb parameter to ip_output and ip6_output to allow the IPsec code to
  locate the security policy to apply to outbound packets
o bump __FreeBSD_version so code can be conditionalized
o fixup ipfilter's call to ip_output based on __FreeBSD_version

Reviewed by:	julian, luigi (silent), -arch, -net, darren
Approved by:	julian, silence from everyone else
Obtained from:	openbsd (mostly)
MFC after:	1 month
2002-10-16 01:54:46 +00:00
Sean Chittenden
927a76bb5e Increase the max dummynet hash size from 1024 to 65536. Default is still
1024.

Silence on:	-net, -ipfw 4weeks+
Reviewed by:	dd
Approved by:	knu (mentor)
MFC after:	3 weeks
2002-10-12 07:45:23 +00:00
Matthew Dillon
c8d50f2414 turn off debugging by default if bandwidth delay product limiting is
turned on (it is already off in -stable).
2002-10-10 21:41:30 +00:00
Matthew Dillon
28257b5ccc Update various comments mainly related to retransmit/FIN that I
documented while working on a previous bug.

Fix a PERSIST bug.  Properly account for a FIN sent during a PERSIST.

MFC after:	7 days
2002-10-10 19:21:50 +00:00
Maxim Konovalov
a5428e3a9a Fix IPOPT_TS processing: do not overwrite IP address by timestamp.
PR:		misc/42121
Submitted by:	Praveen Khurjekar <praveen@codito.com>
Reviewed by:	silence on -net
MFC after:	1 month
2002-10-10 12:03:36 +00:00
Maxim Sobolev
748bb23dcc Since bpf is no longer an optional component, remove associated ifdef's.
Submitted by:	don't quite remember - the name of the sender disappeared
		with the rest of my inbox. :(
2002-10-02 09:38:17 +00:00
Mike Barcroft
c0ec31f93e Include <sys/cdefs.h> so the visibility conditionals are available.
(This should have been included with the previous revision.)
2002-10-02 04:22:34 +00:00
Mike Barcroft
0cd4a9031e Use visibility conditionals. Only TCP_NODELAY ends up being defined
in the standards case.
2002-10-02 04:19:47 +00:00
Matthew Dillon
a84db8f49e Guido found another bug. There is a situation with
timestamped TCP packets where FreeBSD will send DATA+FIN and
A W2K box will ack just the DATA portion.  If this occurs
after FreeBSD has done a (NewReno) fast-retransmit and is
recovering it (dupacks > threshold) it triggers a case in
tcp_newreno_partial_ack() (tcp_newreno() in stable) where
tcp_output() is called with the expectation that the retransmit
timer will be reloaded.  But tcp_output() falls through and
returns without doing anything, causing the persist timer to be
loaded instead.  This causes the connection to hang until W2K gives up.
This occurs because in the case where only the FIN must be acked, the
'len' calculation in tcp_output() will be 0, a lot of checks will be
skipped, and the FIN check will also be skipped because it is designed
to handle FIN retransmits, not forced transmits from tcp_newreno().

The solution is to simply set TF_ACKNOW before calling tcp_output()
to absolute guarentee that it will run the send code and reset the
retransmit timer.  TF_ACKNOW is already used for this purpose in other
cases.

For some unknown reason this patch also seems to greatly reduce
the number of duplicate acks received when Guido runs his tests over
a lossy network.  It is quite possible that there are other
tcp_newreno{_partial_ack()} cases which were not generating the expected
output which this patch also fixes.

X-MFC after:	Will be MFC'd after the freeze is over
2002-09-30 18:55:45 +00:00
Poul-Henning Kamp
37c841831f Be consistent about "static" functions: if the function is marked
static in its prototype, mark it static at the definition too.

Inspired by:    FlexeLint warning #512
2002-09-28 17:15:38 +00:00
Peter Wemm
224af215a6 Zap now-unused SHLIB_MINOR 2002-09-28 00:25:32 +00:00
Maxim Konovalov
cb7641e85b Slightly rearrange a code in rev. 1.164:
o Move len initialization closer to place of its first usage.
o Compare len with 0 to improve readability.
o Explicitly zero out phlen in ip_insertoptions() in failure case.

Suggested by:   jhb
Reviewed by:    jhb
MFC after:      2 weeks
2002-09-23 08:56:24 +00:00
Alfred Perlstein
ebc82cbbf0 s/__attribute__((__packed__))/__packed/g 2002-09-23 06:25:08 +00:00
Mike Silbersack
c1c36a2c68 Fix issue where shutdown(socket, SHUT_RD) was effectively
ignored for TCP sockets.

NetBSD PR:	18185
Submitted by:	Sean Boudreau <seanb@qnx.com>
MFC after:	3 days
2002-09-22 02:54:07 +00:00
Poul-Henning Kamp
a5554bf05b Use m_fixhdr() rather than roll our own. 2002-09-18 19:43:01 +00:00
Matthew Dillon
fa55172bc0 Guido reported an interesting bug where an FTP connection between a
Windows 2000 box and a FreeBSD box could stall.  The problem turned out
to be a timestamp reply bug in the W2K TCP stack.  FreeBSD sends a
timestamp with the SYN, W2K returns a timestamp of 0 in the SYN+ACK
causing FreeBSD to calculate an insane SRTT and RTT, resulting in
a maximal retransmit timeout (60 seconds).  If there is any packet
loss on the connection for the first six or so packets the retransmit
case may be hit (the window will still be too small for fast-retransmit),
causing a 60+ second pause.  The W2K box gives up and closes the
connection.

This commit works around the W2K bug.

15:04:59.374588 FREEBSD.20 > W2K.1036: S 1420807004:1420807004(0) win 65535 <mss 1460,nop,wscale 2,nop,nop,timestamp 188297344 0> (DF) [tos 0x8]
15:04:59.377558 W2K.1036 > FREEBSD.20: S 4134611565:4134611565(0) ack 1420807005 win 17520 <mss 1460,nop,wscale 0,nop,nop,timestamp 0 0> (DF)

Bug reported by: Guido van Rooij <guido@gvr.org>
2002-09-17 22:21:37 +00:00
Maxim Sobolev
563a9b6ecb Remove __RCSID().
Submitted by:	bde
2002-09-17 11:31:41 +00:00
Maxim Konovalov
1cf4349926 Explicitly clear M_FRAG flag on a mbuf with the last fragment to unbreak
ip fragments reassembling for loopback interface.

Discussed with:	bde, jlemon
Reviewed by:	silence on -net
MFC after:	2 weeks
2002-09-17 11:20:02 +00:00
Maxim Konovalov
e079ba8d93 In rare cases when there is no room for ip options ip_insertoptions()
can fail and corrupt a header length. Initialize len and check what
ip_insertoptions() returns.

Reviewed by:	archie, silence on -net
MFC after:	5 days
2002-09-17 11:13:04 +00:00
Jennifer Yang
4a03a8a8c7 Tempary fix for inet6. The final fix is to change in6_pcbnotify to take pcbinfo instead
of pcbhead. It is on the way.
2002-09-17 03:19:43 +00:00
Maxim Sobolev
2b82e3b367 Remove superfluous break. 2002-09-10 09:18:33 +00:00
Maxim Sobolev
565bb857d0 Since from now on encap_input() also catches IPPROTO_MOBILE and IPPROTO_GRE
packets in addition to IPPROTO_IPV4 and IPPROTO_IPV6, explicitly specify
IPPROTO_IPV4 or IPPROTO_IPV6 instead of -1 when calling encap_attach().

MFC after:	28 days
		(along with other if_gre changes)
2002-09-09 09:36:47 +00:00
Maxim Sobolev
c23d234cce Reduce namespace pollution by staticizing everything, which doesn't need to
be visible from outside of the module.
2002-09-06 18:16:03 +00:00
Maxim Sobolev
8e96e13e6a Add a new gre(4) driver, which could be used to create GRE (RFC1701)
and MOBILE (RFC2004) IP tunnels.

Obrained from:  NetBSD
2002-09-06 17:12:50 +00:00
Bruce Evans
40545cf5fc Fixed namespace pollution in uma changes:
- use `struct uma_zone *' instead of uma_zone_t, so that <sys/uma.h> isn't
  a prerequisite.
- don't include <sys/uma.h>.
Namespace pollution makes "opaque" types like uma_zone_t perfectly
non-opaque.  Such types should never be used (see style(9)).

Fixed subsequently grwon dependencies of this header on its own pollution:
- include <sys/_mutex.h> and its prerequisite <sys/_lock.h> instead of
  depending on namespace pollution 2 layers deep in <sys/uma.h>.
2002-09-05 19:48:52 +00:00
Bruce Evans
c74af4fac1 Include <sys/mutex.h> and its prerequisite <sys/lock.h> instead of depending
on namespace pollution 4 layers deep in <netinet/in_pcb.h>.

Removed unused includes.  Sorted includes.
2002-09-05 15:33:30 +00:00
Maxim Sobolev
386fefa3a0 Add in_hosteq() and in_nullhost() macros to make life of developers
porting NetBSD code a little bit easier.

Obtained from:	NetBSD
2002-09-04 09:55:50 +00:00
Darren Reed
1851791868 some ipfilter files that accidently got imported here 2002-08-29 13:27:26 +00:00
Darren Reed
070700595d This commit was generated by cvs2svn to compensate for changes in r102514,
which included commits to RCS files with non-trunk default branches.
2002-08-28 13:26:01 +00:00
Philippe Charnier
93b0017f88 Replace various spelling with FALLTHROUGH which is lint()able 2002-08-25 13:23:09 +00:00
Crist J. Clark
784d7650f7 Lock the sysctl(8) knobs that turn ip{,6}fw(8) firewalling and
firewall logging on and off when at elevated securelevel(8). It would
be nice to be able to only lock these at securelevel >= 3, like rules
are, but there is no such functionality at present. I don't see reason
to be adding features to securelevel(8) with MAC being merged into 5.0.

PR:		kern/39396
Reviewed by:	luigi
MFC after:	1 week
2002-08-25 03:50:29 +00:00
Matthew Dillon
4f1e1f32b6 Correct bug in t_bw_rtttime rollover, #undef USERTT 2002-08-24 17:22:44 +00:00
Archie Cobbs
4a6a94d8d8 Replace (ab)uses of "NULL" where "0" is really meant. 2002-08-22 21:24:01 +00:00
Mike Barcroft
abbd890233 o Merge <machine/ansi.h> and <machine/types.h> into a new header
called <machine/_types.h>.
o <machine/ansi.h> will continue to live so it can define MD clock
  macros, which are only MD because of gratuitous differences between
  architectures.
o Change all headers to make use of this.  This mainly involves
  changing:
    #ifdef _BSD_FOO_T_
    typedef	_BSD_FOO_T_	foo_t;
    #undef _BSD_FOO_T_
    #endif
  to:
    #ifndef _FOO_T_DECLARED
    typedef	__foo_t	foo_t;
    #define	_FOO_T_DECLARED
    #endif

Concept by:	bde
Reviewed by:	jake, obrien
2002-08-21 16:20:02 +00:00
Don Lewis
26ef6ac4df Create new functions in_sockaddr(), in6_sockaddr(), and
in6_v4mapsin6_sockaddr() which allocate the appropriate sockaddr_in*
structure and initialize it with the address and port information passed
as arguments.  Use calls to these new functions to replace code that is
replicated multiple times in in_setsockaddr(), in_setpeeraddr(),
in6_setsockaddr(), in6_setpeeraddr(), in6_mapped_sockaddr(), and
in6_mapped_peeraddr().  Inline COMMON_END in tcp_usr_accept() so that
we can call in_sockaddr() with temporary copies of the address and port
after the PCB is unlocked.

Fix the lock violation in tcp6_usr_accept() (caused by calling MALLOC()
inside in6_mapped_peeraddr() while the PCB is locked) by changing
the implementation of tcp6_usr_accept() to match tcp_usr_accept().

Reviewed by:	suz
2002-08-21 11:57:12 +00:00
Juli Mallett
ded7008a07 Enclose IPv6 addresses in brackets when they are displayed printable with a
TCP/UDP port seperated by a colon.  This is for the log_in_vain facility.

Pointed out by:	Edward J. M. Brocklesby
Reviewed by:	ume
MFC after:	2 weeks
2002-08-19 19:47:13 +00:00
Luigi Rizzo
306fe283a1 Raise limit for port lists to 30 entries/ranges.
Remove a duplicate "logging" message, and identify the firewall
as ipfw2 in the boot message.
2002-08-19 04:45:01 +00:00
Matthew Dillon
1fcc99b5de Implement TCP bandwidth delay product window limiting, similar to (but
not meant to duplicate) TCP/Vegas.  Add four sysctls and default the
implementation to 'off'.

net.inet.tcp.inflight_enable	enable algorithm (defaults to 0=off)
net.inet.tcp.inflight_debug	debugging (defaults to 1=on)
net.inet.tcp.inflight_min	minimum window limit
net.inet.tcp.inflight_max	maximum window limit

MFC after:	1 week
2002-08-17 18:26:02 +00:00
Jeffrey Hsu
c068736a61 Cosmetic-only changes for readability.
Reviewed by:	(early form passed by) bde
Approved by:	itojun (from core@kame.net)
2002-08-17 02:05:25 +00:00
Luigi Rizzo
99e5e64504 sys/netinet/ip_fw2.c:
Implement the M_SKIP_FIREWALL bit in m_flags to avoid loops
    for firewall-generated packets (the constant has to go in sys/mbuf.h).

    Better comments on keepalive generation, and enforce dyn_rst_lifetime
    and dyn_fin_lifetime to be less than dyn_keepalive_period.

    Enforce limits (up to 64k) on the number of dynamic buckets, and
    retry allocation with smaller sizes.

    Raise default number of dynamic rules to 4096.

    Improved handling of set of rules -- now you can atomically
    enable/disable multiple sets, move rules from one set to another,
    and swap sets.

sbin/ipfw/ipfw2.c:

    userland support for "noerror" pipe attribute.

    userland support for sets of rules.

    minor improvements on rule parsing and printing.

sbin/ipfw/ipfw.8:

    more documentation on ipfw2 extensions, differences from ipfw1
    (so we can use the same manpage for both), stateful rules,
    and some additional examples.
    Feedback and more examples needed here.
2002-08-16 10:31:47 +00:00
Alfred Perlstein
e88894d39a make the strings for tcptimers, tanames and prurequests const to silence
warnings.
2002-08-16 09:07:59 +00:00
Robert Watson
365433d9b8 Code formatting sync to trustedbsd_mac: don't perform an assignment
in an if clause.

PR:
Submitted by:
Reviewed by:
Approved by:
Obtained from:
MFC after:
2002-08-15 22:04:31 +00:00
Robert Watson
fb95b5d3c3 Rename mac_check_socket_receive() to mac_check_socket_deliver() so that
we can use the names _receive() and _send() for the receive() and send()
checks.  Rename related constants, policy implementations, etc.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-15 18:51:27 +00:00
Jeffrey Hsu
b5addd8564 Reset dupack count in header prediction.
Follow-on to rev 1.39.

Reviewed by: jayanth, Thomas R Henderson <thomas.r.henderson@boeing.com>, silby, dillon
2002-08-15 17:13:18 +00:00
Luigi Rizzo
4bbf3b8b3a Kernel support for a dummynet option:
When a pipe or queue has the "noerror" attribute, do not report
drops to the caller (ip_output() and friends).
(2 lines to implement it, 2 lines to document it.)

This will let you simulate losses on the sender side as if they
happened in the middle of the network, i.e. with no explicit feedback
to the sender.

manpage and ipfw2.c changes to follow shortly, together with other
ipfw2 changes.

Requested by: silby
MFC after: 3 days
2002-08-15 16:53:43 +00:00
Robert Watson
ecd3e8ff5a It's now sufficient to rely on a nested include of _label.h to make sure
all structures in ip_var.h are defined, so remove include of mac.h.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-15 14:34:45 +00:00
Robert Watson
9daf40feaa Perform a nested include of _label.h if #ifdef _KERNEL. This will
satisfy consumers of ip_var.h that need a complete definition of
struct ipq and don't include mac.h.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-15 14:34:02 +00:00
Robert Watson
3b6aad64bf Add mac.h -- raw_ip.c was depending on nested inclusion of mac.h which
is no longer present.

Pointed out by:	bmilekic
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-15 14:27:46 +00:00
Poul-Henning Kamp
ae89fdaba7 remove spurious printf 2002-08-13 19:13:23 +00:00
Jennifer Yang
3d6ade3a03 Assert that the inpcb lock is held when calling tcp_output().
Approved by:	hsu
2002-08-12 03:22:46 +00:00
Luigi Rizzo
43405724ec One bugfix and one new feature.
The bugfix (ipfw2.c) makes the handling of port numbers with
a dash in the name, e.g. ftp-data, consistent with old ipfw:
use \\ before the - to consider it as part of the name and not
a range separator.

The new feature (all this description will go in the manpage):

each rule now belongs to one of 32 different sets, which can
be optionally specified in the following form:

	ipfw add 100 set 23 allow ip from any to any

If "set N" is not specified, the rule belongs to set 0.

Individual sets can be disabled, enabled, and deleted with the commands:

	ipfw disable set N
	ipfw enable set N
	ipfw delete set N

Enabling/disabling of a set is atomic. Rules belonging to a disabled
set are skipped during packet matching, and they are not listed
unless you use the '-S' flag in the show/list commands.
Note that dynamic rules, once created, are always active until
they expire or their parent rule is deleted.
Set 31 is reserved for the default rule and cannot be disabled.

All sets are enabled by default. The enable/disable status of the sets
can be shown with the command

	ipfw show sets

Hopefully, this feature will make life easier to those who want to
have atomic ruleset addition/deletion/tests. Examples:

To add a set of rules atomically:

	ipfw disable set 18
	ipfw add ... set 18 ...		# repeat as needed
	ipfw enable set 18

To delete a set of rules atomically

	ipfw disable set 18
	ipfw delete set 18
	ipfw enable set 18

To test a ruleset and disable it and regain control if something
goes wrong:

	ipfw disable set 18
	ipfw add ... set 18 ...         # repeat as needed
	ipfw enable set 18 ; echo "done "; sleep 30 && ipfw disable set 18

    here if everything goes well, you press control-C before
    the "sleep" terminates, and your ruleset will be left
    active. Otherwise, e.g. if you cannot access your box,
    the ruleset will be disabled after the sleep terminates.

I think there is only one more thing that one might want, namely
a command to assign all rules in set X to set Y, so one can
test a ruleset using the above mechanisms, and once it is
considered acceptable, make it part of an existing ruleset.
2002-08-10 04:37:32 +00:00
Mike Silbersack
a9ce5e05b5 Handle PMTU discovery in syn-ack packets slightly differently;
rely on syncache flags instead of directly accessing the route
entry.

MFC after:	3 days
2002-08-05 22:34:15 +00:00
Luigi Rizzo
1cbd978e96 bugfix: move check for udp_blackhole before the one for icmp_bandlim.
MFC after: 3 days
2002-08-04 20:50:13 +00:00
Luigi Rizzo
ea779ff36c Fix handling of packets which matched an "ipfw fwd" rule on the input side. 2002-08-03 14:59:45 +00:00
Robert Watson
e316463a86 When preserving the IP header in extra mbuf in the IP forwarding
case, also preserve the MAC label.  Note that this mbuf allocation
is fairly non-optimal, but not my fault.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-02 20:45:27 +00:00
Robert Watson
09a555cbf9 Work to fix LINT build.
Reported by:	phk
2002-08-02 18:08:14 +00:00
Robert Watson
bdb3fa1832 Introduce support for Mandatory Access Control and extensible
kernel access control.

Add MAC support for the UDP protocol.  Invoke appropriate MAC entry
points to label packets that are generated by local UDP sockets,
and to authorize delivery of mbufs to local sockets both in the
multicast/broadcast case and the unicast case.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 21:37:34 +00:00
Robert Watson
d00e44fb4a Document the undocumented assumption that at least one of the PCB
pointer and incoming mbuf pointer will be non-NULL in tcp_respond().
This is relied on by the MAC code for correctness, as well as
existing code.

Obtained from:	TrustedBSD PRoject
Sponsored by:	DARPA, NAI Labs
2002-08-01 03:54:43 +00:00
Robert Watson
0070e096d7 Introduce support for Mandatory Access Control and extensible
kernel access control.

Add support for labeling most out-going ICMP messages using an
appropriate MAC entry point.  Currently, we do not explicitly
label packet reflect (timestamp, echo request) ICMP events,
implicitly using the originating packet label since the mbuf is
reused.  This will be made explicit at some point.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 03:53:04 +00:00
Robert Watson
c488362e1a Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument the TCP socket code for packet generation and delivery:
label outgoing mbufs with the label of the socket, and check socket and
mbuf labels before permitting delivery to a socket.  Assign labels
to newly accepted connections when the syncache/cookie code has done
its business.  Also set peer labels as convenient.  Currently,
MAC policies cannot influence the PCB matching algorithm, so cannot
implement polyinstantiation.  Note that there is at least one case
where a PCB is not available due to the TCP packet not being associated
with any socket, so we don't label in that case, but need to handle
it in a special manner.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 19:06:49 +00:00
Robert Watson
4ea889c666 Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument the raw IP socket code for packet generation and delivery:
label outgoing mbufs with the label of the socket, and check the
socket and mbuf labels before permitting delivery to a socket,
permitting MAC policies to selectively allow delivery of raw IP mbufs
to various raw IP sockets that may be open.  Restructure the policy
checking code to compose IPsec and MAC results in a more readable
manner.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 18:30:34 +00:00
Robert Watson
4ed84624a2 Introduce support for Mandatory Access Control and extensible
kernel access control.

When fragmenting an IP datagram, invoke an appropriate MAC entry
point so that MAC labels may be copied (...) to the individual
IP fragment mbufs by MAC policies.

When IP options are inserted into an IP datagram when leaving a
host, preserve the label if we need to reallocate the mbuf for
alignment or size reasons.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 17:21:01 +00:00
Robert Watson
36b0360b37 Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument the code managing IP fragment reassembly queues (struct ipq)
to invoke appropriate MAC entry points to maintain a MAC label on
each queue.  Permit MAC policies to associate information with a queue
based on the mbuf that caused it to be created, update that information
based on further mbufs accepted by the queue, influence the decision
making process by which mbufs are accepted to the queue, and set the
label of the mbuf holding the reassembled datagram following reassembly
completetion.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 17:17:51 +00:00
Robert Watson
0ec4b12334 Introduce support for Mandatory Access Control and extensible
kernel access control.

When generating an IGMP message, invoke a MAC entry point to permit
the MAC framework to label its mbuf appropriately for the target
interface.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 16:46:56 +00:00
Robert Watson
19527d3e22 Introduce support for Mandatory Access Control and extensible
kernel access control.

When generating an ARP query, invoke a MAC entry point to permit the
MAC framework to label its mbuf appropriately for the interface.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 16:45:16 +00:00
Robert Watson
d3990b06e1 Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke the MAC framework to label mbuf created using divert sockets.
These labels may later be used for access control on delivery to
another socket, or to an interface.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI LAbs
2002-07-31 16:42:47 +00:00
Robert Watson
549e4c9e4e Introduce support for Mandatory Access Control and extensible
kernel access control.

Label IP fragment reassembly queues, permitting security features to
be maintained on those objects.  ipq_label will be used to manage
the reassembly of fragments into IP datagrams using security
properties.  This permits policies to deny the reassembly of fragments,
as well as influence the resulting label of a datagram following
reassembly.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-30 23:09:20 +00:00
Maxim Konovalov
d46a53126c Use a common way to release locks before exit.
Reviewed by:	hsu
2002-07-29 09:01:39 +00:00
Don Lewis
5c38b6dbce Wire the sysctl output buffer before grabbing any locks to prevent
SYSCTL_OUT() from blocking while locks are held.  This should
only be done when it would be inconvenient to make a temporary copy of
the data and defer calling SYSCTL_OUT() until after the locks are
released.
2002-07-28 19:59:31 +00:00
Hajimu UMEMOTO
66ef17c4b6 make setsockopt(IPV6_V6ONLY, 0) actuall work for tcp6.
MFC after:	1 week
2002-07-25 18:10:04 +00:00
Hajimu UMEMOTO
eccb7001ee cleanup usage of ip6_mapped_addr_on and ip6_v6only. now,
ip6_mapped_addr_on is unified into ip6_v6only.

MFC after:	1 week
2002-07-25 17:40:45 +00:00
Luigi Rizzo
be1826c354 Only log things net.inet.ip.fw.verbose is set 2002-07-24 02:41:19 +00:00
Ruslan Ermilov
61a875d706 Don't forget to recalculate the IP checksum of the original
IP datagram embedded into ICMP error message.

Spotted by:	tcpdump 3.7.1 (-vvv)
MFC after:	3 days
2002-07-23 00:16:19 +00:00
Ruslan Ermilov
88c39af35f Don't shrink socket buffers in tcp_mss(), application might have already
configured them with setsockopt(SO_*BUF), for RFC1323's scaled windows.

PR:		kern/11966
MFC after:	1 week
2002-07-22 22:31:09 +00:00
Hajimu UMEMOTO
854d3b19a2 do not refer to IN6P_BINDV6ONLY anymore.
Obtained from:	KAME
MFC after:	1 week
2002-07-22 15:51:02 +00:00
John Polstra
8ea8a6804b Fix overflows in intermediate calculations in sysctl_msec_to_ticks().
At hz values of 1000 and above the overflows caused net.inet.tcp.keepidle
to be reported as negative.

MFC after:	3 days
2002-07-20 23:48:59 +00:00
Robert Watson
69dac2ea47 Don't export 'struct ipq' from kernel, instead #ifdef _KERNEL. As kernel
data structures pick up security and synchronization primitives, it
becomes increasingly desirable not to arbitrarily export them via
include files to userland, as the userland applications pick up new
#include dependencies.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-20 22:46:20 +00:00
Matthew Dillon
d65bf08af3 Add the tcps_sndrexmitbad statistic, keep track of late acks that caused
unnecessary retransmissions.
2002-07-19 18:29:38 +00:00
Matthew Dillon
701bec5a38 Introduce two new sysctl's:
net.inet.tcp.rexmit_min (default 3 ticks equiv)

    This sysctl is the retransmit timer RTO minimum,
    specified in milliseconds.  This value is
    designed for algorithmic stability only.

net.inet.tcp.rexmit_slop (default 200ms)

    This sysctl is the retransmit timer RTO slop
    which is added to every retransmit timeout and
    is designed to handle protocol stack overheads
    and delayed ack issues.

Note that the *original* code applied a 1-second
RTO minimum but never applied real slop to the RTO
calculation, so any RTO calculation over one second
would have no slop and thus not account for
protocol stack overheads (TCP timestamps are not
a measure of protocol turnaround!).  Essentially,
the original code made the RTO calculation almost
completely irrelevant.

Please note that the 200ms slop is debateable.
This commit is not meant to be a line in the sand,
and if the community winds up deciding that increasing
it is the correct solution then it's easy to do.
Note that larger values will destroy performance
on lossy networks while smaller values may result in
a greater number of unnecessary retransmits.
2002-07-18 19:06:12 +00:00
Luigi Rizzo
90780c4b05 Move IPFW2 definition before including ip_fw.h
Make indentation of new parts consistent with the style used for this file.
2002-07-18 05:18:41 +00:00
Matthew Dillon
22fd54d461 I don't know how the minimum retransmit timeout managed to get set to
one second but it badly breaks throughput on networks with minor packet
loss.

Complaints by: at least two people tracked down to this.
MFC after:	3 days
2002-07-17 23:32:03 +00:00
Luigi Rizzo
318aa87b59 Fix a panic when doing "ipfw add pipe 1 log ..."
Also synchronize ip_dummynet.c with the version in RELENG_4 to
ease MFC's.
2002-07-17 07:21:42 +00:00
Luigi Rizzo
a8c102a2ec Implement keepalives for dynamic rules, so they will not expire
just because you leave your session idle.

Also, put in a fix for 64-bit architectures (to be revised).

In detail:

ip_fw.h

  * Reorder fields in struct ip_fw to avoid alignment problems on
    64-bit machines. This only masks the problem, I am still not
    sure whether I am doing something wrong in the code or there
    is a problem elsewhere (e.g. different aligmnent of structures
    between userland and kernel because of pragmas etc.)

  * added fields in dyn_rule to store ack numbers, so we can
    generate keepalives when the dynamic rule is about to expire

ip_fw2.c

  * use a local function, send_pkt(), to generate TCP RST for Reset rules;

  * save about 250 bytes by cleaning up the various snprintf()
    in ipfw_log() ...

  * ... and use twice as many bytes to implement keepalives
    (this seems to be working, but i have not tested it extensively).

Keepalives are generated once every 5 seconds for the last 20 seconds
of the lifetime of a dynamic rule for an established TCP flow.  The
packets are sent to both sides, so if at least one of the endpoints
is responding, the timeout is refreshed and the rule will not expire.

You can disable this feature with

        sysctl net.inet.ip.fw.dyn_keepalive=0

(the default is 1, to have them enabled).

MFC after: 1 day

(just kidding... I will supply an updated version of ipfw2 for
RELENG_4 tomorrow).
2002-07-14 23:47:18 +00:00
Luigi Rizzo
3956b02345 Avoid dereferencing a null pointer in ro_rt.
This was always broken in HEAD (the offending statement was introduced
in rev. 1.123 for HEAD, while RELENG_4 included this fix (in rev.
1.99.2.12 for RELENG_4) and I inadvertently deleted it in 1.99.2.30.

So I am also restoring these two lines in RELENG_4 now.
We might need another few things from 1.99.2.30.
2002-07-12 22:08:47 +00:00
Don Lewis
2d20c83f93 Back out the previous change, since it looks like locking udbinfo provides
sufficient protection.
2002-07-12 09:55:48 +00:00
Don Lewis
bb1dd7a45a Lock inp while we're accessing it. 2002-07-12 08:05:22 +00:00
Don Lewis
0e1eebb846 Defer calling SYSCTL_OUT() until after the locks have been released. 2002-07-11 23:18:43 +00:00
Don Lewis
142b2bd644 Reduce the nesting level of a code block that doesn't need to be in
an else clause.
2002-07-11 23:13:31 +00:00
Luigi Rizzo
c7ea683135 Change one variable to make it easier to switch between ipfw and ipfw2 2002-07-09 06:53:38 +00:00
Luigi Rizzo
b3063f064c Fix a bug caused by dereferencing an invalid pointer when
no punch_fw was used.
Fix another couple of bugs which prevented rules from being
installed properly.

On passing, use IPFW2 instead of NEW_IPFW to compile the new code,
and slightly simplify the instruction generation code.
2002-07-08 22:57:35 +00:00
Luigi Rizzo
d63b346ab1 No functional changes, but:
Following Darren's suggestion, make Dijkstra happy and rewrite the
ipfw_chk() main loop removing a lot of goto's and using instead a
variable to store match status.

Add a lot of comments to explain what instructions are supposed to
do and how -- this should ease auditing of the code and make people
more confident with it.

In terms of code size: the entire file takes about 12700 bytes of text,
about 3K of which are for the main function, ipfw_chk(), and 2K (ouch!)
for ipfw_log().
2002-07-08 22:46:01 +00:00
Luigi Rizzo
7d4d3e9051 Remove one unused command name. 2002-07-08 22:39:19 +00:00
Luigi Rizzo
5185195169 Forgot to update one field name in one of the latest commits. 2002-07-08 22:37:55 +00:00
Luigi Rizzo
5e43aef891 Implement the last 2-3 missing instructions for ipfw,
now it should support all the instructions of the old ipfw.

Fix some bugs in the user interface, /sbin/ipfw.

Please check this code against your rulesets, so i can fix the
remaining bugs (if any, i think they will be mostly in /sbin/ipfw).

Once we have done a bit of testing, this code is ready to be MFC'ed,
together with a bunch of other changes (glue to ipfw, and also the
removal of some global variables) which have been in -current for
a couple of weeks now.

MFC after: 7 days
2002-07-05 22:43:06 +00:00
Brian Somers
27cc91fbf8 Remove trailing whitespace 2002-07-01 11:19:40 +00:00
Jesper Skriver
eb538bfd64 Extend the effect of the sysctl net.inet.tcp.icmp_may_rst
so that, if we recieve a ICMP "time to live exceeded in transit",
(type 11, code 0) for a TCP connection on SYN-SENT state, close
the connection.

MFC after:	2 weeks
2002-06-30 20:07:21 +00:00
Jonathan Lemon
0080a004d7 One possible code path for syncache_respond() is:
syncache_respond(A), ip_output(), ip_input(), tcp_input(), syncache_badack(B)

Which winds up deleting a different entry from the syncache.  Handle
this by not utilizing the next entry in the timer chain until after
syncache_respond() completes.  The case of A == B should not be possible.

Problem found by: Don Bowman <don@sandvine.com>
2002-06-28 19:12:38 +00:00
Doug Rabson
24f8fd9fd1 Fix warning.
Reviewed by: luigi
2002-06-28 08:36:26 +00:00
Luigi Rizzo
9758b77ff1 The new ipfw code.
This code makes use of variable-size kernel representation of rules
(exactly the same concept of BPF instructions, as used in the BSDI's
firewall), which makes firewall operation a lot faster, and the
code more readable and easier to extend and debug.

The interface with the rest of the system is unchanged, as witnessed
by this commit. The only extra kernel files that I am touching
are if_fw.h and ip_dummynet.c, which is quite tied to ipfw. In
userland I only had to touch those programs which manipulate the
internal representation of firewall rules).

The code is almost entirely new (and I believe I have written the
vast majority of those sections which were taken from the former
ip_fw.c), so rather than modifying the old ip_fw.c I decided to
create a new file, sys/netinet/ip_fw2.c .  Same for the user
interface, which is in sbin/ipfw/ipfw2.c (it still compiles to
/sbin/ipfw).  The old files are still there, and will be removed
in due time.

I have not renamed the header file because it would have required
touching a one-line change to a number of kernel files.

In terms of user interface, the new "ipfw" is supposed to accepts
the old syntax for ipfw rules (and produce the same output with
"ipfw show". Only a couple of the old options (out of some 30 of
them) has not been implemented, but they will be soon.

On the other hand, the new code has some very powerful extensions.
First, you can put "or" connectives between match fields (and soon
also between options), and write things like

ipfw add allow ip from { 1.2.3.4/27 or 5.6.7.8/30 } 10-23,25,1024-3000 to any

This should make rulesets slightly more compact (and lines longer!),
by condensing 2 or more of the old rules into single ones.

Also, as an example of how easy the rules can be extended, I have
implemented an 'address set' match pattern, where you can specify
an IP address in a format like this:

        10.20.30.0/26{18,44,33,22,9}

which will match the set of hosts listed in braces belonging to the
subnet 10.20.30.0/26 . The match is done using a bitmap, so it is
essentially a constant time operation requiring a handful of CPU
instructions (and a very small amount of memmory -- for a full /24
subnet, the instruction only consumes 40 bytes).

Again, in this commit I have focused on functionality and tried
to minimize changes to the other parts of the system. Some performance
improvement can be achieved with minor changes to the interface of
ip_fw_chk_t. This will be done later when this code is settled.

The code is meant to compile unmodified on RELENG_4 (once the
PACKET_TAG_* changes have been merged), for this reason
you will see #ifdef __FreeBSD_version in a couple of places.
This should minimize errors when (hopefully soon) it will be time
to do the MFC.
2002-06-27 23:02:18 +00:00
Maxime Henrion
7627c6cbcc Warning fixes for 64 bits platforms. With this last fix,
I can build a GENERIC sparc64 kernel with -Werror.

Reviewed by:	luigi
2002-06-27 11:02:06 +00:00
Luigi Rizzo
713a6ea063 Just a comment on some additional consistency checks that could
be added here.
2002-06-26 21:00:53 +00:00
Kenneth D. Merry
98cb733c67 At long last, commit the zero copy sockets code.
MAKEDEV:	Add MAKEDEV glue for the ti(4) device nodes.

ti.4:		Update the ti(4) man page to include information on the
		TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options,
		and also include information about the new character
		device interface and the associated ioctls.

man9/Makefile:	Add jumbo.9 and zero_copy.9 man pages and associated
		links.

jumbo.9:	New man page describing the jumbo buffer allocator
		interface and operation.

zero_copy.9:	New man page describing the general characteristics of
		the zero copy send and receive code, and what an
		application author should do to take advantage of the
		zero copy functionality.

NOTES:		Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS,
		TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT.

conf/files:	Add uipc_jumbo.c and uipc_cow.c.

conf/options:	Add the 5 options mentioned above.

kern_subr.c:	Receive side zero copy implementation.  This takes
		"disposable" pages attached to an mbuf, gives them to
		a user process, and then recycles the user's page.
		This is only active when ZERO_COPY_SOCKETS is turned on
		and the kern.ipc.zero_copy.receive sysctl variable is
		set to 1.

uipc_cow.c:	Send side zero copy functions.  Takes a page written
		by the user and maps it copy on write and assigns it
		kernel virtual address space.  Removes copy on write
		mapping once the buffer has been freed by the network
		stack.

uipc_jumbo.c:	Jumbo disposable page allocator code.  This allocates
		(optionally) disposable pages for network drivers that
		want to give the user the option of doing zero copy
		receive.

uipc_socket.c:	Add kern.ipc.zero_copy.{send,receive} sysctls that are
		enabled if ZERO_COPY_SOCKETS is turned on.

		Add zero copy send support to sosend() -- pages get
		mapped into the kernel instead of getting copied if
		they meet size and alignment restrictions.

uipc_syscalls.c:Un-staticize some of the sf* functions so that they
		can be used elsewhere.  (uipc_cow.c)

if_media.c:	In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid
		calling malloc() with M_WAITOK.  Return an error if
		the M_NOWAIT malloc fails.

		The ti(4) driver and the wi(4) driver, at least, call
		this with a mutex held.  This causes witness warnings
		for 'ifconfig -a' with a wi(4) or ti(4) board in the
		system.  (I've only verified for ti(4)).

ip_output.c:	Fragment large datagrams so that each segment contains
		a multiple of PAGE_SIZE amount of data plus headers.
		This allows the receiver to potentially do page
		flipping on receives.

if_ti.c:	Add zero copy receive support to the ti(4) driver.  If
		TI_PRIVATE_JUMBOS is not defined, it now uses the
		jumbo(9) buffer allocator for jumbo receive buffers.

		Add a new character device interface for the ti(4)
		driver for the new debugging interface.  This allows
		(a patched version of) gdb to talk to the Tigon board
		and debug the firmware.  There are also a few additional
		debugging ioctls available through this interface.

		Add header splitting support to the ti(4) driver.

		Tweak some of the default interrupt coalescing
		parameters to more useful defaults.

		Add hooks for supporting transmit flow control, but
		leave it turned off with a comment describing why it
		is turned off.

if_tireg.h:	Change the firmware rev to 12.4.11, since we're really
		at 12.4.11 plus fixes from 12.4.13.

		Add defines needed for debugging.

		Remove the ti_stats structure, it is now defined in
		sys/tiio.h.

ti_fw.h:	12.4.11 firmware.

ti_fw2.h:	12.4.11 firmware, plus selected fixes from 12.4.13,
		and my header splitting patches.  Revision 12.4.13
		doesn't handle 10/100 negotiation properly.  (This
		firmware is the same as what was in the tree previously,
		with the addition of header splitting support.)

sys/jumbo.h:	Jumbo buffer allocator interface.

sys/mbuf.h:	Add a new external mbuf type, EXT_DISPOSABLE, to
		indicate that the payload buffer can be thrown away /
		flipped to a userland process.

socketvar.h:	Add prototype for socow_setup.

tiio.h:		ioctl interface to the character portion of the ti(4)
		driver, plus associated structure/type definitions.

uio.h:		Change prototype for uiomoveco() so that we'll know
		whether the source page is disposable.

ufs_readwrite.c:Update for new prototype of uiomoveco().

vm_fault.c:	In vm_fault(), check to see whether we need to do a page
		based copy on write fault.

vm_object.c:	Add a new function, vm_object_allocate_wait().  This
		does the same thing that vm_object allocate does, except
		that it gives the caller the opportunity to specify whether
		it should wait on the uma_zalloc() of the object structre.

		This allows vm objects to be allocated while holding a
		mutex.  (Without generating WITNESS warnings.)

		vm_object_allocate() is implemented as a call to
		vm_object_allocate_wait() with the malloc flag set to
		M_WAITOK.

vm_object.h:	Add prototype for vm_object_allocate_wait().

vm_page.c:	Add page-based copy on write setup, clear and fault
		routines.

vm_page.h:	Add page based COW function prototypes and variable in
		the vm_page structure.

Many thanks to Drew Gallatin, who wrote the zero copy send and receive
code, and to all the other folks who have tested and reviewed this code
over the years.
2002-06-26 03:37:47 +00:00
Jeffrey Hsu
6fd22caf91 Avoid unlocking the inp twice if badport_bandlim() returns -1.
Reported by:	jlemon
2002-06-24 22:25:00 +00:00
Jeffrey Hsu
f14e4cfe33 Style bug: fix 4 space indentations that should have been tabs.
Submitted by:	jlemon
2002-06-24 16:47:02 +00:00
Luigi Rizzo
f10e85d797 Slightly restructure the #ifdef INET6 sections to make the code
more readable.

Remove the six "register" attributes from variables tcp_output(), the
compiler surely knows well how to allocate them.
2002-06-23 21:25:36 +00:00
Luigi Rizzo
410bb1bfe2 Move two global variables to automatic variables within the
only function where they are used (they are used with TCPDEBUG only).
2002-06-23 21:22:56 +00:00
Luigi Rizzo
4d2e36928d Move some global variables in more appropriate places.
Add XXX comments to mark places which need to be taken care of
if we want to remove this part of the kernel from Giant.

Add a comment on a potential performance problem with ip_forward()
2002-06-23 20:48:26 +00:00
Luigi Rizzo
51aed12e52 fix bad indentation and whitespace resulting from cut&paste 2002-06-23 09:15:43 +00:00
Luigi Rizzo
dfd1ae2f86 fix indentation of a comment 2002-06-23 09:14:24 +00:00
Luigi Rizzo
a5924d6100 fix a typo in a comment 2002-06-23 09:13:46 +00:00
Luigi Rizzo
ec3057db9e Remove ip_fw_fwd_addr (forgotten in previous commit)
remove some extra whitespace.
2002-06-23 09:03:42 +00:00
Luigi Rizzo
2b25acc158 Remove (almost all) global variables that were used to hold
packet forwarding state ("annotations") during ip processing.
The code is considerably cleaner now.

The variables removed by this change are:

        ip_divert_cookie        used by divert sockets
        ip_fw_fwd_addr          used for transparent ip redirection
        last_pkt                used by dynamic pipes in dummynet

Removal of the first two has been done by carrying the annotations
into volatile structs prepended to the mbuf chains, and adding
appropriate code to add/remove annotations in the routines which
make use of them, i.e. ip_input(), ip_output(), tcp_input(),
bdg_forward(), ether_demux(), ether_output_frame(), div_output().

On passing, remove a bug in divert handling of fragmented packet.
Now it is the fragment at offset 0 which sets the divert status of
the whole packet, whereas formerly it was the last incoming fragment
to decide.

Removal of last_pkt required a change in the interface of ip_fw_chk()
and dummynet_io(). On passing, use the same mechanism for dummynet
annotations and for divert/forward annotations.

option IPFIREWALL_FORWARD is effectively useless, the code to
implement it is very small and is now in by default to avoid the
obfuscation of conditionally compiled code.

NOTES:
 * there is at least one global variable left, sro_fwd, in ip_output().
   I am not sure if/how this can be removed.

 * I have deliberately avoided gratuitous style changes in this commit
   to avoid cluttering the diffs. Minor stule cleanup will likely be
   necessary

 * this commit only focused on the IP layer. I am sure there is a
   number of global variables used in the TCP and maybe UDP stack.

 * despite the number of files touched, there are absolutely no API's
   or data structures changed by this commit (except the interfaces of
   ip_fw_chk() and dummynet_io(), which are internal anyways), so
   an MFC is quite safe and unintrusive (and desirable, given the
   improved readability of the code).

MFC after: 10 days
2002-06-22 11:51:02 +00:00
Jeffrey Hsu
2ded288c88 Fix logic which resulted in missing a call to INP_UNLOCK().
Submitted by:	jlemon, mux
2002-06-21 22:54:16 +00:00
Jeffrey Hsu
2d40081d1f TCP notify functions can change the pcb list. 2002-06-21 22:52:48 +00:00
Peter Wemm
532cf61bcf Solve the 'unregistered netisr 18' information notice with a sledgehammer.
Register the ISR early, but do not actually kick off the timer until we
see some activity.  This still saves us from running the arp timers on
a system with no network cards.
2002-06-20 01:27:40 +00:00
Seigo Tanimura
03e4918190 Remove so*_locked(), which were backed out by mistake. 2002-06-18 07:42:02 +00:00
Jeffrey Hsu
3ce144ea88 Notify functions can destroy the pcb, so they have to return an
indication of whether this happenned so the calling function
knows whether or not to unlock the pcb.

Submitted by:	Jennifer Yang (yangjihui@yahoo.com)
Bug reported by:  Sid Carter (sidcarter@symonds.net)
2002-06-14 08:35:21 +00:00
Mike Silbersack
eb5afeba22 Re-commit w/fix:
Ensure that the syn cache's syn-ack packets contain the same
  ip_tos, ip_ttl, and DF bits as all other tcp packets.

  PR:             39141
  MFC after:      2 weeks

This time, make sure that ipv4 specific code (aka all of the above)
is only run in the ipv4 case.
2002-06-14 03:08:05 +00:00
Mike Silbersack
70d2b17029 Back out ip_tos/ip_ttl/DF "fix", it just panic'd my box. :)
Pointy-hat to:	silby
2002-06-14 02:43:20 +00:00
Mike Silbersack
21c3b2fc69 Ensure that the syn cache's syn-ack packets contain the same
ip_tos, ip_ttl, and DF bits as all other tcp packets.

PR:		39141
MFC after:	2 weeks
2002-06-14 02:36:34 +00:00
Jeffrey Hsu
9c68f33a9d Because we're holding an exclusive write lock on the head, references to
the new inp cannot leak out even though it has been placed on the head list.
2002-06-13 23:14:58 +00:00
Jeffrey Hsu
61ffc0b1a6 The UDP head was unlocked too early in one unicast case.
Submitted by:	bug reported by arr
2002-06-12 15:21:41 +00:00
Jeffrey Hsu
73dca2078d Fix logic which resulted in missing a call to INP_UNLOCK(). 2002-06-12 03:11:06 +00:00
Jeffrey Hsu
3cfcc388ea Fix typo where INP_INFO_RLOCK should be INP_INFO_RUNLOCK.
Submitted by: tegge, jlemon

Prefer LIST_FOREACH macro.
  Submitted by: jlemon
2002-06-12 03:08:08 +00:00
Jeffrey Hsu
7a9378e7f5 Remember to initialize the control block head mutex. 2002-06-11 10:58:57 +00:00
Jeffrey Hsu
3d9baf34c0 Fix typo.
Submitted by:	Kyunghwan Kim <redjade@atropos.snu.ac.kr>
2002-06-11 10:56:49 +00:00
Jeffrey Hsu
e98d6424af Every array elt is initialized in the following loop, so remove
unnecessary M_ZERO.
2002-06-10 23:48:37 +00:00
Jeffrey Hsu
f76fcf6d4c Lock up inpcb.
Submitted by:	Jennifer Yang <yangjihui@yahoo.com>
2002-06-10 20:05:46 +00:00
Seigo Tanimura
4cc20ab1f0 Back out my lats commit of locking down a socket, it conflicts with hsu's work.
Requested by:	hsu
2002-05-31 11:52:35 +00:00
Garrett Wollman
c7c5d95d56 Avoid unintentional trigraph. 2002-05-30 20:53:45 +00:00
Andrew R. Reiter
db40007d42 - Change the newly turned INVARIANTS #ifdef blocks (they were changed from
DIAGNOSTIC yesterday) into KASSERT()'s as these help to increase code
  readability.
2002-05-21 18:52:24 +00:00
Andrew R. Reiter
4cb674c960 - Turn a few DIAGNOSTIC into INVARIANTS since they are really sanity
checks.
2002-05-20 22:05:13 +00:00
Andrew R. Reiter
1e404e4e86 - Turn a DIAGNOSTIC into an INVARIANTS since it's a sanity check. Use
proper ``if'' statement style.
2002-05-20 22:04:19 +00:00
Andrew R. Reiter
e16f6e6200 - Turn a #ifdef DIAGNOSTIC to #ifdef INVARIANTS as the code from this line
through the #endif is really a sanity check.

Reviewed by: jake
2002-05-20 21:50:39 +00:00
Seigo Tanimura
243917fe3b Lock down a socket, milestone 1.
o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a
  socket buffer. The mutex in the receive buffer also protects the data
  in struct socket.

o Determine the lock strategy for each members in struct socket.

o Lock down the following members:

  - so_count
  - so_options
  - so_linger
  - so_state

o Remove *_locked() socket APIs.  Make the following socket APIs
  touching the members above now require a locked socket:

 - sodisconnect()
 - soisconnected()
 - soisconnecting()
 - soisdisconnected()
 - soisdisconnecting()
 - sofree()
 - soref()
 - sorele()
 - sorwakeup()
 - sotryfree()
 - sowakeup()
 - sowwakeup()

Reviewed by:	alfred
2002-05-20 05:41:09 +00:00
Kelly Yancey
c3a2190cdc Reset token-ring source routing control field on receipt of ethernet frame
without source routing information.  This restores the behaviour in this
scenario to that of prior to my last commit.
2002-05-15 01:03:32 +00:00
Robert Watson
f83c7ad731 Modify the arguments to syncache_socket() to include the mbuf (m) that
results in the syncache entry being turned into a socket.  While it's
not used in the main tree, this is required in the MAC tree so that
labels can be propagated from the mbuf to the socket.  This is also
useful if you're doing things like transparent IP connection hijacking
and you want to use the syncache/cookie mechanism, but we won't go
there.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-05-14 18:57:55 +00:00
Luigi Rizzo
4b9840932d Add ipfw hooks to ether_demux() and ether_output_frame().
Ipfw processing of frames at layer 2 can be enabled by the sysctl variable

	net.link.ether.ipfw=1

Consider this feature experimental, because right now, the firewall
is invoked in the places indicated below, and controlled by the
sysctl variables listed on the right.  As a consequence, a packet
can be filtered from 1 to 4 times depending on the path it follows,
which might make a ruleset a bit hard to follow.

I will add an ipfw option to tell if we want a given rule to apply
to ether_demux() and ether_output_frame(), but we have run out of
flags in the struct ip_fw so i need to think a bit on how to implement
this.

		to upper layers
	     |			     |
	     +----------->-----------+
	     ^			     V
	[ip_input]		[ip_output]	net.inet.ip.fw.enable=1
	     |			     |
	     ^			     V
	[ether_demux]      [ether_output_frame]	net.link.ether.ipfw=1
	     |			     |
	     +->- [bdg_forward]-->---+		net.link.ether.bridge_ipfw=1
	     ^			     V
	     |			     |
		 to devices
2002-05-13 10:37:19 +00:00
Luigi Rizzo
2f8707ca5d Remove custom definitions (IP_FW_TCPF_SYN etc.) of TCP header flags
which are the same as the original ones (TH_SYN etc.)
2002-05-13 10:21:13 +00:00
Luigi Rizzo
201efb1913 Add code to match MAC header fields (at the moment supported on
bridged packets only, soon to come also for packets on ordinary
ether_input() and ether_output() paths. The syntax is

    ipfw add <action> MAC dst src type

where dst and src can be "any" or a MAC address optionallyfollowed
by a mask, e.g.

	10:20:30:40:50
	10:20:30:40:50/32
	10:20:30:40:50&ff:ff:ff:f0:ff:0f

and type can be a single ethernet type, a range, or a type followed by
a mask (values are always in hexadecimal) e.g.

	0800
	0800-0806
	0800/8
	0800&03ff

Note, I am still uncertain on what is the best format for inputting
these values, having the values in hexadecimal is convenient in most
cases but can be confusing sometimes. Suggestions welcome.

Implement suggestion from PR 37778 to allow "not me" on destination
and source IP. The code in the PR was slightly wrong and interfered
with the normal handling of IP addresses. This version hopefully is
correct.

Minor cleanup of the code, in some places moving the indentation to 4
spaces because the code was becoming too deep. Eventually, in a
separate commit, I will move the whole file to 4 space indent.
2002-05-12 20:43:50 +00:00
Dima Dorfman
11612afabe s/demon/daemon/ 2002-05-12 00:22:38 +00:00
Mike Barcroft
9828cf071d Remove some duplicate types that should have been removed as part of
the rearranging in the previous revision.

Pointy hat to:	cvs update (merging), mike (for not noticing)
2002-05-11 23:28:51 +00:00
Luigi Rizzo
d60315bef5 Cleanup the interface to ip_fw_chk, two of the input arguments
were totally useless and have been removed.

ip_input.c, ip_output.c:
    Properly initialize the "ip" pointer in case the firewall does an
    m_pullup() on the packet.

    Remove some debugging code forgotten long ago.

ip_fw.[ch], bridge.c:
    Prepare the grounds for matching MAC header fields in bridged packets,
    so we can have 'etherfw' functionality without a lot of kernel and
    userland bloat.
2002-05-09 10:34:57 +00:00
Kelly Yancey
42fdfc126a Move ISO88025 source routing information into sockaddr_dl's sdl_data
field.  This returns the sdl_data field to a variable-length field.  More
importantly, this prevents a easily-reproduceable data-corruption bug when
the interface name plus the hardware address exceed the sdl_data field's
original 12 byte limit.  However, token-ring interfaces may still overflow
the new sdl_data field's 46 byte limit if the interface name exceeds 6
characters (since 6 characters for interface name plus 6 for hardware
address plus 34 for source routing = the size of sdl_data).  Further
refinements could overcome this limitation but would break binary
compatibility; this commit only addresses fixing the bug for
commonly-occuring cases without breaking binary compatibility with the
intention that the functionality can be MFC'ed to -stable.

  See message ID's (both send to -arch):
	20020421013332.F87395-100000@gateway.posi.net
	20020430181359.G11009-300000@gateway.posi.net
  for a more thorough description of the bug addressed and how to
reproduce it.

Approved by:	silence on -arch and -net
Sponsored by:	NTT Multimedia Communications Labs
MFC after:	1 week
2002-05-07 22:14:06 +00:00
Hajimu UMEMOTO
8117063142 Revised MLD-related definitions
- Used mld_xxx and MLD_xxx instead of mld6_xxx and MLD6_xxx according
  to the official defintions in rfc2292bis
  (macro definitions for backward compatibility were provided)
- Changed the first member of mld_hdr{} from mld_hdr to mld_icmp6_hdr
  to avoid name space conflict in C++

This change makes ports/net/pchar compilable again under -CURRENT.

Obtained from:	KAME
2002-05-06 16:28:25 +00:00
Luigi Rizzo
43d11e8453 Indentation and comments cleanup, no functional change.
MFC after: 3 days
2002-05-05 21:27:47 +00:00
Alfred Perlstein
f132072368 Redo the sigio locking.
Turn the sigio sx into a mutex.

Sigio lock is really only needed to protect interrupts from dereferencing
the sigio pointer in an object when the sigio itself is being destroyed.

In order to do this in the most unintrusive manner change pgsigio's
sigio * argument into a **, that way we can lock internally to the
function.
2002-05-01 20:44:46 +00:00
Alfred Perlstein
59017610b2 Fix some edge cases where bad string handling could occur.
Submitted by: ps
2002-05-01 08:29:41 +00:00
Alfred Perlstein
ef1047305e cleanup:
fix line wraps, add some comments, fix macro definitions, fix for(;;) loops.
2002-05-01 08:08:24 +00:00
Crist J. Clark
0f56b10c4b Enlighten those who read the FINE POINTS of the documentation a bit
more on how ipfw(8) deals with tiny fragments. While we're at it, add
a quick log message to even let people know we dropped a packet. (Note
that the second FINE POINT is somewhat redundant given the first, but
since the code is there, leave the docs for it.)

MFC after:	1 day
2002-05-01 06:29:16 +00:00
Seigo Tanimura
960ed29c4b Revert the change of #includes in sys/filedesc.h and sys/socketvar.h.
Requested by:	bde

Since locking sigio_lock is usually followed by calling pgsigio(),
move the declaration of sigio_lock and the definitions of SIGIO_*() to
sys/signalvar.h.

While I am here, sort include files alphabetically, where possible.
2002-04-30 01:54:54 +00:00
Seigo Tanimura
d48d4b2501 Add a global sx sigio_lock to protect the pointer to the sigio object
of a socket.  This avoids lock order reversal caused by locking a
process in pgsigio().

sowakeup() and the callers of it (sowwakeup, soisconnected, etc.) now
require sigio_lock to be locked.  Provide sowwakeup_locked(),
soisconnected_locked(), and so on in case where we have to modify a
socket and wake up a process atomically.
2002-04-27 08:24:29 +00:00
Mike Barcroft
58631bbe0e Rearrange <netinet/in.h> so that it is easier to conditionalize
sections for various standards.  Conditionalize sections for various
standards.  Use standards conforming spelling for types in the
sockaddr_in structure.
2002-04-24 01:26:11 +00:00
Mike Barcroft
c6e43821cd Add sa_family_t type to <sys/_types.h> and typedefs to <netinet/in.h>
and <sys/socket.h>.  Previously, sa_family_t was only typedef'd in
<sys/socket.h>.
2002-04-20 02:24:35 +00:00
SUZUKI Shinsuke
88ff5695c1 just merged cosmetic changes from KAME to ease sync between KAME and FreeBSD.
(based on freebsd4-snap-20020128)

Reviewed by:	ume
MFC after:	1 week
2002-04-19 04:46:24 +00:00
SUZUKI Shinsuke
f361efa0be initialize local variable explicitly
Reviewed by:	ume
Obtained from:	Fujitsu guys
MFC after:	1 week
2002-04-11 02:14:21 +00:00
Mike Silbersack
898568d8ab Remove some ISN generation code which has been unused since the
syncache went in.

MFC after:	3 days
2002-04-10 22:12:01 +00:00
Mike Silbersack
f2697d4d75 Totally nuke IPPORT_USERRESERVED, it is no longer used anywhere, update
remaining comments to reflect new ephemeral port range.

Reminded by:	Maxim Konovalov <maxim@macomnet.ru>
MFC after:	3 days
2002-04-10 19:30:58 +00:00
Mike Barcroft
13c3fcc238 Unconditionalize the definition of INET_ADDRSTRLEN and
INET6_ADDRSTRLEN.  Doing this helps expose bogus redefinitions in 3rd
party software.
2002-04-10 11:59:02 +00:00
Brian Somers
6ce6e2be71 Remove the code that masks an EEXIST returned from rtinit() when
calling ioctl(SIOC[AS]IFADDR).

This allows the following:

  ifconfig xx0 inet 1.2.3.1 netmask 0xffffff00
  ifconfig xx0 inet 1.2.3.17 netmask 0xfffffff0 alias
  ifconfig xx0 inet 1.2.3.25 netmask 0xfffffff8 alias
  ifconfig xx0 inet 1.2.3.26 netmask 0xffffffff alias

but would (given the above) reject this:

  ifconfig xx0 inet 1.2.3.27 netmask 0xfffffff8 alias

due to the conflicting netmasks.  I would assert that it's wrong
to mask the EEXIST returned from rtinit() as in the above scenario, the
deletion of the 1.2.3.25 address will leave the 1.2.3.27 address
as unroutable as it was in the first place.

Offered for review on: -arch, -net
Discussed with: stephen macmanus <stephenm@bayarea.net>
MFC after: 3 weeks
2002-04-10 01:42:44 +00:00
Brian Somers
5a43847d1c Don't add host routes for interface addresses of 0.0.0.0/8 -> 0.255.255.255.
This change allows bootp to work with more than one interface, at the
expense of some rather ``wrong'' looking code.  I plan to MFC this in
place of luigi's recent #ifdef BOOTP stuff that was committed to this
file in -stable, as that's slightly more wrong that this is.

Offered for review on: -arch, -net
MFC after: 2 weeks
2002-04-10 01:42:32 +00:00
John Baldwin
ad278afdf0 Change the first argument of prison_xinpcb() to be a thread pointer instead
of a proc pointer so that prison_xinpcb() can use td_ucred.
2002-04-09 20:04:10 +00:00
Mike Silbersack
c3b2fe55ba Update comments to reflect the recent ephemeral port range
change.

Noticed by:	ru
MFC After:	1 day
2002-04-09 18:01:26 +00:00
Matthew N. Dodd
b0c570df3f Retire this copy; it now lives in sys/net/fddi.h. 2002-04-05 19:24:38 +00:00
John Baldwin
6008862bc2 Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on:	i386, alpha, sparc64
2002-04-04 21:03:38 +00:00
John Baldwin
44731cab3b Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API.  The entire API now consists of two functions
similar to the pre-KSE API.  The suser() function takes a thread pointer
as its only argument.  The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0.  The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on:	smp@
2002-04-01 21:31:13 +00:00
Mike Barcroft
8822d3fb83 o Implement <sys/_types.h>, a new header for storing types that are
MI, not required to be a fixed size, and used in multiple headers.
  This will grow in time, as more things move here from <sys/types.h>
  and <machine/ansi.h>.
o Add missing type definitions (uint16_t and uint32_t) to
  <arpa/inet.h> and <netinet/in.h>.
o Reduce pollution in <sys/types.h> by using `#if _FOO_T_DECLARED'
  widgets to avoid including <sys/stdint.h>.
o Add some missing type definitions to <unistd.h> and note the ones
  that still need to be added.
o Make use of <sys/_types.h> primitives in <grp.h> and <sys/types.h>.

Reviewed by:	bde
2002-04-01 08:12:25 +00:00
Bruce Evans
c1cd65bae8 Fixed some style bugs in the removal of __P(()). Continuation lines
were not outdented to preserve non-KNF lining up of code with parentheses.
Switch to KNF formatting.
2002-03-24 10:19:10 +00:00
Robert Watson
29dc1288b0 Merge from TrustedBSD MAC branch:
Move the network code from using cr_cansee() to check whether a
    socket is visible to a requesting credential to using a new
    function, cr_canseesocket(), which accepts a subject credential
    and object socket.  Implement cr_canseesocket() so that it does a
    prison check, a uid check, and add a comment where shortly a MAC
    hook will go.  This will allow MAC policies to seperately
    instrument the visibility of sockets from the visibility of
    processes.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-03-22 19:57:41 +00:00
Ruslan Ermilov
e3f406b3c1 Prevent icmp_reflect() from calling ip_output() with a NULL route
pointer which will then result in the allocated route's reference
count never being decremented.  Just flood ping the localhost and
watch refcnt of the 127.0.0.1 route with netstat(1).

Submitted by:	jayanth

Back out ip_output.c,v 1.143 and ip_mroute.c,v 1.69 that allowed
ip_output() to be called with a NULL route pointer.  The previous
paragraph shows why this was a bad idea in the first place.

MFC after:	0 days
2002-03-22 16:45:54 +00:00
Mike Silbersack
9e5a5ed4c5 Change the ephemeral port range from 1024-5000 to 49152-65535.
This increases the number of concurrent outgoing connections from ~4000
to ~16000.  Other OSes (Solaris, OS X, NetBSD) and many other NAT
products have already made this change without ill effects, so we
should not run into any problems.

MFC after:	1 week
2002-03-22 03:28:11 +00:00
Orion Hodson
f0f3379ed5 Send periodic ARP requests when ARP entries for hosts we are sending
to are about to expire.  This prevents high packet rate flows from
experiencing packet drops at the sender following ARP cache entry
timeout.

PR:		kern/25517
Reviewed by:	luigi
MFC after:	7 days
2002-03-20 15:56:36 +00:00
Jeff Roberson
69c2d429c1 Switch vm_zone.h with uma.h. Change over to uma interfaces. 2002-03-20 05:48:55 +00:00
Alfred Perlstein
4d77a549fe Remove __P. 2002-03-19 21:25:46 +00:00
Jeff Roberson
8355f576a9 This is the first part of the new kernel memory allocator. This replaces
malloc(9) and vm_zone with a slab like allocator.

Reviewed by:	arch@
2002-03-19 09:11:49 +00:00
Robert Watson
16aae019a0 NAI DBA update 2002-03-14 16:53:39 +00:00
Mike Barcroft
6a6230d2f6 o Add INET_ADDRSTRLEN and INET6_ADDRSTRLEN defines to <arpa/inet.h>
for POSIX.1-2001 conformance.
o Add magic to <netinet/in.h> and <netinet6/in6.h> to prevent
  redefining INET_ADDRSTRLEN and INET6_ADDRSTRLEN.
o Add a note about missing typedefs in <arpa/inet.h>.
2002-03-10 06:42:27 +00:00
Mike Barcroft
d846855da8 o Don't require long long support in bswap64() functions.
o In i386's <machine/endian.h>, macros have some advantages over
  inlines, so change some inlines to macros.
o In i386's <machine/endian.h>, ungarbage collect word_swap_int()
  (previously __uint16_swap_uint32), it has some uses on i386's with
  PDP endianness.

Submitted by:	bde

o Move a comment up in <machine/endian.h> that was accidentially moved
  down a few revisions ago.
o Reenable userland's use of optimized inline-asm versions of
  byteorder(3) functions.
o Fix ordering of prototypes vs. redefinition of byteorder(3)
  functions, so that the non-GCC (libc asm) case has proper
  prototypes.
o Add proper prototypes for byteorder(3) functions in <sys/param.h>.
o Prevent redundant duplicate prototypes by making use of the
  _BYTEORDER_PROTOTYPED define.
o Move the bswap16(), bswap32(), bswap64() C functions into MD space
  for platforms in which asm versions don't exist.  This significantly
  reduces the complexity of some things at the cost of duplicate code.

Reviewed by:	bde
2002-03-09 21:02:16 +00:00
Hajimu UMEMOTO
b7d6d9526c - Set inc_isipv6 in tcp6_usr_connect().
- When making a pcb from a sync cache, do not forget to copy inc_isipv6.

Obtained from:	KAME
MFC After:	1 week
2002-02-28 17:11:10 +00:00
John Baldwin
a854ed9893 Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.
2002-02-27 18:32:23 +00:00
Crist J. Clark
93ec91ba6d Change the wording of the inline comments from the previous commit.
Objection from:	ru
2002-02-27 13:52:06 +00:00
Alfred Perlstein
60d9b32e44 More IPV6 const fixes. 2002-02-27 05:11:50 +00:00
Dima Dorfman
76183f3453 Introduce a version field to `struct xucred' in place of one of the
spares (the size of the field was changed from u_short to u_int to
reflect what it really ends up being).  Accordingly, change users of
xucred to set and check this field as appropriate.  In the kernel,
this is being done inside the new cru2x() routine which takes a
`struct ucred' and fills out a `struct xucred' according to the
former.  This also has the pleasant sideaffect of removing some
duplicate code.

Reviewed by:	rwatson
2002-02-27 04:45:37 +00:00
Brooks Davis
673623b376 Staticize an extern that no one else used. 2002-02-26 18:24:00 +00:00
Chris D. Faulhaber
546f251b29 Enforce inbound IPsec SPD
Reviewed by:	fenner
2002-02-26 02:11:13 +00:00
Alfred Perlstein
6d53e16389 Document what inpcb->inp_vflag is for.
Submitted by: Marco Molteni <molter@tin.it>
2002-02-25 09:41:43 +00:00
Crist J. Clark
2ca2159f22 The TCP code did not do sufficient checks on whether incoming packets
were destined for a broadcast IP address. All TCP packets with a
broadcast destination must be ignored. The system only ignored packets
that were _link-layer_ broadcasts or multicast. We need to check the
IP address too since it is quite possible for a broadcast IP address
to come in with a unicast link-layer address.

Note that the check existed prior to CSRG revision 7.35, but was
removed. This commit effectively backs out that nine-year-old change.

PR:		misc/35022
2002-02-25 08:29:21 +00:00
Luigi Rizzo
ca462bcfcb BUGFIX: make use of the pointer to the target of skipto rules,
so that after the first time we can follow the pointer instead
of having to scan the list.
This was the intended behaviour from day one.

PR: 34639
MFC-after: 3 days
2002-02-20 17:15:57 +00:00
Jonathan Lemon
6b33ceb8e3 When expanding a syncache entry into a socket, inherit the socket options
from the current listen socket instead of the cached (and possibly stale)
TCB pointer.
2002-02-20 16:47:11 +00:00
Mike Barcroft
fd8e4ebc8c o Move NTOHL() and associated macros into <sys/param.h>. These are
deprecated in favor of the POSIX-defined lowercase variants.
o Change all occurrences of NTOHL() and associated marcros in the
  source tree to use the lowercase function variants.
o Add missing license bits to sparc64's <machine/endian.h>.
  Approved by: jake
o Clean up <machine/endian.h> files.
o Remove unused __uint16_swap_uint32() from i386's <machine/endian.h>.
o Remove prototypes for non-existent bswapXX() functions.
o Include <machine/endian.h> in <arpa/inet.h> to define the
  POSIX-required ntohl() family of functions.
o Do similar things to expose the ntohl() family in libstand, <netinet/in.h>,
  and <sys/param.h>.
o Prepend underscores to the ntohl() family to help deal with
  complexities associated with having MD (asm and inline) versions, and
  having to prevent exposure of these functions in other headers that
  happen to make use of endian-specific defines.
o Create weak aliases to the canonical function name to help deal with
  third-party software forgetting to include an appropriate header.
o Remove some now unneeded pollution from <sys/types.h>.
o Add missing <arpa/inet.h> includes in userland.

Tested on:	alpha, i386
Reviewed by:	bde, jake, tmm
2002-02-18 20:35:27 +00:00
Ruslan Ermilov
51c8ec4a3d Moved the 127/8 check below so that IPF redirects have a chance of working.
MFC after:	1 day
2002-02-15 12:19:03 +00:00
Jonathan Lemon
0cab7c4b08 When a duplicate SYN arrives which matches an entry in the syncache,
update our lazy reference to the inpcb structure, as it may have changed.

Found by: dima
2002-02-12 02:03:50 +00:00
Dima Dorfman
364efeccb0 Silence unused variable warning in the !KLD_MODULE case.
Submitted by:	archie
2002-02-10 22:22:05 +00:00
Julian Elischer
079b7badea Pre-KSE/M3 commit.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.

Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
2002-02-07 20:58:47 +00:00
Hajimu UMEMOTO
c5993889d8 In tcp_respond(), correctly reset returned IPv6 header. This is essential
when the original packet contains an IPv6 extension header.

Obtained from:	KAME
MFC after:	1 week
2002-02-04 17:37:06 +00:00
Mark Murray
6ebd73f0f6 WARNS=n and lint(1) silencer. Declare an array of (const) strings
as const char.
2002-02-03 11:57:32 +00:00
Crist J. Clark
56962689cc The ipfw(8) 'tee' action simply hasn't worked on incoming packets for
some time. _All_ packets, regardless of destination, were accepted by
the machine as if addressed to it.

Jump back to 'pass' processing for a teed packet instead of falling
through as if it was ours.

PR:		kern/31130
Reviewed by:	-net, luigi
MFC after:	2 weeks
2002-01-26 10:14:08 +00:00
Jonathan Lemon
d9b7cc1c8d The ENDPTS_EQ macro was comparing the one of the fports to itself. Fix.
Submitted by: emy@boostworks.com
2002-01-22 17:54:28 +00:00
Hajimu UMEMOTO
a4a6e77341 - Check the address family of the destination cached in a PCB.
- Clear the cached destination before getting another cached route.
  Otherwise, garbage in the padding space (which might be filled in if it was
  used for IPv4) could annoy rtalloc.

Obtained from:	KAME
2002-01-21 20:04:22 +00:00
Ruslan Ermilov
8c3f5566ae RFC1122 requires that addresses of the form { 127, <any> } MUST NOT
appear outside a host.

PR:		30792, 33996
Obtained from:	ip_input.c
MFC after:	1 week
2002-01-21 13:59:42 +00:00
Ruslan Ermilov
84caf00847 Fix a panic condition in icmp_reflect() introduced in rev. 1.61.
(We should be able to handle locally originated IP packets, and
these do not have m_pkthdr.rcvif set.)

PR:		kern/32806, kern/33766
Reviewed by:	luigi
Fix tested by:	Maxim Konovalov <maxim@macomnet.ru>,
		Erwin Lansing <erwin@lansing.dk>
2002-01-11 12:13:57 +00:00
Mike Smith
bedbd47e6a Initialise the intrq_present fields at runtime, not link time. This allows
us to load protocols at runtime, and avoids the use of common variables.

Also fix the ip6_intrq assignment so that it works at all.
2002-01-08 10:34:03 +00:00
Crist J. Clark
db38d34abb Fix a missing "ipfw:" in a syslog message.
MFC after:	1 day
2002-01-07 07:12:09 +00:00
Bill Fenner
92bdb2fa39 Pre-calculate the checksum for multicast packets sourced on a
multicast router.  This is overkill; it should be possible to
delay to hardware interfaces and only pre-calculate when forwarding
to a tunnel.
2002-01-05 18:23:53 +00:00
Robert Watson
e6658b129e o Spelling fix in comment: tcp_ouput -> tcp_output 2002-01-04 17:21:27 +00:00
Yaroslav Tykhiy
d0ebc0d2f1 Don't reveal a router in the IPSTEALTH mode through IP options.
The following steps are involved:
a) the IP options related to routing (LSRR and SSRR) are processed
   as though the router were a host,
b) the other IP options are processed as usual only if the packet
   is destined for the router; otherwise they are ignored.

PR:		kern/23123
Discussed in:	freebsd-hackers
2001-12-29 09:24:18 +00:00
Julian Elischer
3efc30142c Fix ipfw fwd so that it acts as the docs say
when forwarding an incoming packet to another machine.

Obtained from:	Vicor Production tree
MFC after: 3 weeks
2001-12-28 21:21:57 +00:00
Yaroslav Tykhiy
37b5d6e33d Implement matching IP precedence in ipfw(4).
Submitted by:	Igor Timkin <ivt@gamma.ru>
2001-12-21 18:43:02 +00:00
Jonathan Lemon
6c19b85f43 Remove a change that snuck in from my private tree. 2001-12-21 05:07:39 +00:00
Jonathan Lemon
45a0329051 If syncookies are disabled (net.inet.tcp.syncookies) then use the faster
arc4random() routine to generate ISNs instead of creating them with MD5().

Suggested by: silby
2001-12-21 04:41:08 +00:00
Jonathan Lemon
e579ba1aea When storing an int value in a void *, use intptr_t as the cast type
(instead of int) to keep the 64 bit platforms happy.
2001-12-19 15:57:43 +00:00
Yaroslav Tykhiy
3f9e31220b Don't try to free a NULL route when doing IPFIREWALL_FORWARD.
An old route will be NULL at that point if a packet were initially
routed to an interface (using the IP_ROUTETOIF flag.)

Submitted by:	Igor Timkin <ivt@gamma.ru>
2001-12-19 14:54:13 +00:00
Jonathan Lemon
a9c9684163 Extend the SYN DoS defense by adding syncookies to the syncache.
All TCP ISNs that are sent out are valid cookies, which allows entries
in the syncache to be dropped and still have the ACK accepted later.
As all entries pass through the syncache, there is no sudden switchover
from cache -> cookies when the cache is full; instead, syncache entries
simply have a reduced lifetime.  More details may be found in the
"Resisting DoS attacks with a SYN cache" paper in the Usenix BSDCon 2002
conference proceedings.

Sponsored by: DARPA, NAI Labs
2001-12-19 06:12:14 +00:00
Ruslan Ermilov
4aa5d00e3d Fixed the bug in transparent TCP proxying with the "encode_ip_hdr"
option -- TcpAliasOut() did not catch the IP header length change.

Submitted by:	Stepachev Andrey <aka50@mail.ru>
2001-12-18 16:13:45 +00:00
Robert Watson
365979cdac o Add IPOPT_ESO for the 'Extended Security' IP option (RFC1108)
Obtained from:	TrustedBSD Project
2001-12-14 19:37:32 +00:00
Robert Watson
18e2b6a995 o Add definition for IPOPT_CIPSO, the commercial security IP option
number.

Submitted by:	Ilmar S. Habibulin <ilmar@watson.org>
Obtained from:	TrustedBSD Project
2001-12-14 19:34:42 +00:00
Jonathan Lemon
aa1f5daa31 whitespace and style fixes recovered from -stable. 2001-12-14 19:34:11 +00:00
Jonathan Lemon
6f00486cfd minor style and whitespace fixes. 2001-12-14 19:33:29 +00:00
Jonathan Lemon
effa274e9e whitespace fixes. 2001-12-14 19:32:47 +00:00
Jonathan Lemon
f8b6a631a2 minor whitespace fixes. 2001-12-14 19:32:00 +00:00
Mike Silbersack
3260eb18f3 Reduce the local network slowstart flightsize from infinity to 4 packets.
Now that we've increased the size of our send / receive buffers, bursting
an entire window onto the network may cause congestion.  As a result,
we will slow start beginning with a flightsize of 4 packets.

Problem reported by: Thomas Zenker <thz@Lennartz-electronic.de>

MFC after:	3 days
2001-12-14 18:26:52 +00:00
Jonathan Lemon
04cad5adb1 Undo one of my last minute changes; move sc_iss up earlier so it
is initialized in case we take the T/TCP path.
2001-12-13 04:05:26 +00:00
Jonathan Lemon
7c183182bf Fix up tabs from cut&n&paste. 2001-12-13 04:02:31 +00:00
Jonathan Lemon
0ef3206bf5 Fix up tabs in comments. 2001-12-13 04:02:09 +00:00
Jonathan Lemon
eaa6d8efe5 Minor style fixes. 2001-12-13 04:01:23 +00:00
Jonathan Lemon
c448c89c59 Minor style fix. 2001-12-13 04:01:01 +00:00
David E. O'Brien
6e551fb628 Update to C99, s/__FUNCTION__/__func__/,
also don't use ANSI string concatenation.
2001-12-10 08:09:49 +00:00
Robert Watson
c39a614e0d o Our currenty userland boot code (due to rc.conf and rc.network) always
enables TCP keepalives using the net.inet.tcp.always_keepalive by default.
  Synchronize the kernel default with the userland default.
2001-12-07 17:01:28 +00:00
Ruslan Ermilov
47891de1a5 Fixed remotely exploitable DoS in arpresolve().
Easily exploitable by flood pinging the target
host over an interface with the IFF_NOARP flag
set (all you need to know is the target host's
MAC address).

MFC after:	0 days
2001-12-05 18:13:34 +00:00
Robert Watson
011376308f o Introduce pr_mtx into struct prison, providing protection for the
mutable contents of struct prison (hostname, securelevel, refcount,
  pr_linux, ...)
o Generally introduce mtx_lock()/mtx_unlock() calls throughout kern/
  so as to enforce these protections, in particular, in kern_mib.c
  protection sysctl access to the hostname and securelevel, as well as
  kern_prot.c access to the securelevel for access control purposes.
o Rewrite linux emulator abstractions for accessing per-jail linux
  mib entries (osname, osrelease, osversion) so that they don't return
  a pointer to the text in the struct linux_prison, rather, a copy
  to an array passed into the calls.  Likewise, update linprocfs to
  use these primitives.
o Update in_pcb.c to always use prison_getip() rather than directly
  accessing struct prison.

Reviewed by:	jhb
2001-12-03 16:12:27 +00:00
Matthew Dillon
262c1c1a4e Fix a bug with transmitter restart after receiving a 0 window. The
receiver was not sending an immediate ack with delayed acks turned on
when the input buffer is drained, preventing the transmitter from
restarting immediately.

Propogate the TCP_NODELAY option to accept()ed sockets.  (Helps tbench and
is a good idea anyway).

Some cleanup.  Identify additonal issues in comments.

MFC after:	1 day
2001-12-02 08:49:29 +00:00
Ruslan Ermilov
04d59553b2 Allow for ip_output() to be called with a NULL route pointer.
This fixes a panic I introduced yesterday in ip_icmp.c,v 1.64.
2001-12-01 13:48:16 +00:00
Mike Barcroft
de2656d0ed o Stop abusing MD headers with non-MD types.
o Hide nonstandard functions and types in <netinet/in.h> when
  _POSIX_SOURCE is defined.
o Add some missing types (required by POSIX.1-200x) to <netinet/in.h>.
o Restore vendor ID from Rev 1.1 in <netinet/in.h> and make use of new
  __FBSDID() macro.
o Fix some miscellaneous issues in <arpa/inet.h>.
o Correct final argument for the inet_ntop() function (POSIX.1-200x).
o Get rid of the namespace pollution from <sys/types.h> in
  <arpa/inet.h>.

Reviewed by:		fenner
Partially submitted by:	bde
2001-12-01 03:43:01 +00:00
Matthew Dillon
d912c694ee The transmit burst limit for newreno completely breaks TCP's performance
if the receive side is using delayed acks.  Temporarily remove it.

MFC after:	0 days
2001-11-30 21:33:39 +00:00
Brian Somers
0f02fdac67 During SIOCAIFADDR, if in_ifinit() fails and we've already added an
interface address, blow the address away again before returning the
error.

In in_ifinit(), if we get an error from rtinit() and we've also got
a destination address, return the error rather than masking EEXISTS.
Failing to create a host route when configuring an interface should
be treated as an error.
2001-11-30 14:00:55 +00:00
Ruslan Ermilov
bd7142087b - Make ip_rtaddr() global, and use it to look up the correct source
address in icmp_reflect().
- Two new "struct icmpstat" members: icps_badaddr and icps_noroute.

PR:		kern/31575
Obtained from:	BSD/OS
MFC after:	1 week
2001-11-30 10:40:28 +00:00
Dima Dorfman
3a33b1b3b7 ipfw_modevent(): Don't use an unnatural block to define a variable
(fcp) that's already defined in the outer block and isn't used
anywhere else.  This silences -Wunused.

Reviewed by:	md5(1)
2001-11-27 20:32:47 +00:00
Dima Dorfman
e8d41815df Remove debugging printfs that weren't conditional on any debugging
options in handling MOD_{UN,}LOAD (they weren't very useful, anyway).
2001-11-27 20:28:48 +00:00
Dima Dorfman
0d4bef5dd4 In icmp_reflect(): If the packet was not addressed to us and was
received on an interface without an IP address, try to find a
non-loopback AF_INET address to use.  If that fails, drop it.
Previously, we used the address at the top of the in_ifaddrhead list,
which didn't make much sense, and would cause a panic if there were no
AF_INET addresses configured on the system.

PR:		29337, 30524
Reviewed by:	ru, jlemon
Obtained from:	NetBSD
2001-11-27 19:58:09 +00:00
Robert Watson
38e04732fc Add include of net/route.h, as structures moved around due to the
syncache rely on 'struct route' being defined.  This fixes the
LINT build some.
2001-11-27 17:36:39 +00:00
Seigo Tanimura
df89626872 Clear a new syncache entry first, followed by filling in values. This
fixes route breakage due to uncleared gabage on my box.
2001-11-27 11:55:28 +00:00
Ruslan Ermilov
8573e68110 When servicing an internal FTP server, punch ipfirewall(4) holes
for passive mode data connections (PASV/EPSV -> 227/229).  Well,
the actual punching happens a bit later, when the aliasing link
becomes fully specified.

Prodded by:	Danny Carroll <dannycarroll@hotmail.com>
MFC after:	1 week
2001-11-27 10:50:23 +00:00
Ruslan Ermilov
8ba0396688 Restore the ability to use IP_FW_ADD with setsockopt(2) that got
broken in revision 1.86.  This broke natd(8)'s -punch_fw option.

Reported by:	Daniel Rock <D.Rock@t-online.de>,
		setantae <setantae@submonkey.net>
2001-11-26 10:05:58 +00:00
Bruce Evans
419d3454b1 Fixed a buffer overrun. In my kernel configuration, tcp_syncache happens
to be followed by nfsnodehashtbl, so bzeroing callouts beyond the end of
tcp_syncache soon caused a null pointer panic when nfsnodehashtbl was
accessed.
2001-11-23 12:31:27 +00:00
Jonathan Lemon
be2ac88c59 Introduce a syncache, which enables FreeBSD to withstand a SYN flood
DoS in an improved fashion over the existing code.

Reviewed by: silby  (in a previous iteration)
Sponsored by: DARPA, NAI Labs
2001-11-22 04:50:44 +00:00
Jonathan Lemon
d00fd2011d Move initialization of snd_recover into tcp_sendseqinit(). 2001-11-21 18:45:51 +00:00
Matthew Dillon
b1e4abd246 Give struct socket structures a ref counting interface similar to
vnodes.  This will hopefully serve as a base from which we can
expand the MP code.  We currently do not attempt to obtain any
mutex or SX locks, but the door is open to add them when we nail
down exactly how that part of it is going to work.
2001-11-17 03:07:11 +00:00
Robert Watson
ce17880650 o Replace reference to 'struct proc' with 'struct thread' in 'struct
sysctl_req', which describes in-progress sysctl requests.  This permits
  sysctl handlers to have access to the current thread, permitting work
  on implementing td->td_ucred, migration of suser() to using struct
  thread to derive the appropriate ucred, and allowing struct thread to be
  passed down to other code, such as network code where td is not currently
  available (and curproc is used).

o Note: netncp and netsmb are not updated to reflect this change, as they
  are not currently KSE-adapted.

Reviewed by:		julian
Obtained from:	TrustedBSD Project
2001-11-08 02:13:18 +00:00
Andrew R. Reiter
83103a7397 - Fixes non-zero'd out sin_zero field problem so that the padding
is used as it is supposed to be.

Inspired by: PR #31704
Approved by: jdp
Reviewed by: jhb, -net@
2001-11-06 00:48:01 +00:00
Poul-Henning Kamp
d3c64689d8 3.5 years ago Wollman wrote:
"[...] and removes the hostcache code from standard kernels---the
   code that depends on it is not going to happen any time soon,
   I'm afraid."
Time to clean up.
2001-11-05 21:25:02 +00:00
Luigi Rizzo
7b109fa404 MFS: sync the ipfw/dummynet/bridge code with the one recently merged
into stable (mostly , but not only, formatting and comments changes).
2001-11-04 22:56:25 +00:00
Luigi Rizzo
09b2ca212b s/FREE/free/ 2001-11-04 17:35:31 +00:00
Brian Somers
e83aaae350 cmott@scientech.com -> cm@linktel.net
Requested by:	Charles Mott <cmott@scientech.com>
2001-11-03 11:34:09 +00:00
Bill Paul
3528d68f71 Fix a (long standing?) bug in ip_output(): if ip_insertoptions() is
called and ip_output() encounters an error and bails (i.e. host
unreachable), we will leak an mbuf. This is because the code calls
m_freem(m0) after jumping to the bad: label at the end of the function,
when it should be calling m_freem(m). (m0 is the original mbuf list
_without_ the options mbuf prepended.)

Obtained from:	NetBSD
2001-10-30 18:15:48 +00:00
Dag-Erling Smørgrav
bc183b3fe8 Make sure the netmask always has an address family. This fixes Linux
ifconfig, which expects the address returned by the SIOCGIFNETMASK ioctl
to have a valid sa_family.  Similar changes may be necessary for IPv6.

While we're here, get rid of an unnecessary temp variable.

MFC after:	2 weeks
2001-10-30 15:57:20 +00:00
Jonathan Lemon
35609d458d When dropping a packet because there is no room in the queue (which itself
is somewhat bogus), update the statistics to indicate something was dropped.

PR: 13740
2001-10-30 14:58:27 +00:00
Josef Karthauser
06dae58b17 A few more style changes picked up whilst working on an MFC to -stable. 2001-10-29 15:09:07 +00:00
Josef Karthauser
f227535cd8 Fix some whitespace, and a comment that I missed in the last commit. 2001-10-29 14:08:51 +00:00
Josef Karthauser
25549c009a Clean up the style of this header file. 2001-10-29 04:41:28 +00:00
Matthew Dillon
2326da5db5 fix int argument used in printf w/ %ld (cast to long) 2001-10-29 02:19:19 +00:00
Jonathan Lemon
0751407193 Don't use the ip_timestamp structure to access timestamp options, as the
compiler may cause an unaligned access to be generated in some cases.

PR: 30982
2001-10-25 06:27:51 +00:00
Jonathan Lemon
ec691a10e6 If we are bridging, fall back to using any inet address in the system,
irrespective of receive interface, as a last resort.

Submitted by: ru
2001-10-25 06:14:21 +00:00
Jonathan Lemon
807b8338ba Relocate the KASSERT for a null recvif to a location where it will
actually do some good.

Pointed out by: ru
2001-10-25 05:56:30 +00:00
Hajimu UMEMOTO
cb34210012 restore the data of the ip header when extended udp header and data checksum
is calculated.  this caused some trouble in the code which the ip header
is not modified.  for example, inbound policy lookup failed.

Obtained from:	KAME
MFC after:	1 week
2001-10-22 12:43:30 +00:00
Jonathan Lemon
d8b84d9e07 Only examine inet addresses of the interface. This was broken in r1.83,
with the result that the system would reply to an ARP request of 0.0.0.0
2001-10-20 05:14:06 +00:00
Ruslan Ermilov
8071913df2 Pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2.
Have sys/net/route.c:rtrequest1(), which takes ``rt_addrinfo *''
as the argument.  Pass rt_addrinfo all the way down to rtrequest1
and ifa->ifa_rtrequest.  3rd argument of ifa->ifa_rtrequest is now
``rt_addrinfo *'' instead of ``sockaddr *'' (almost noone is
using it anyways).

Benefit: the following command now works.  Previously we needed
two route(8) invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

Remove unsafe typecast in rtrequest(), from ``rtentry *'' to
``sockaddr *''.  It was introduced by 4.3BSD-Reno and never
corrected.

Obtained from:	BSD/OS, NetBSD
MFC after:	1 month
PR:		kern/28360
2001-10-17 18:07:05 +00:00
Max Khon
322dcb8d3d bring in ARP support for variable length link level addresses
Reviewed by:	jdp
Approved by:	jdp
Obtained from:	NetBSD
MFC after:	6 weeks
2001-10-14 20:17:53 +00:00
Robert Watson
8a7d8cc675 - Combine kern.ps_showallprocs and kern.ipc.showallsockets into
a single kern.security.seeotheruids_permitted, describes as:
  "Unprivileged processes may see subjects/objects with different real uid"
  NOTE: kern.ps_showallprocs exists in -STABLE, and therefore there is
  an API change.  kern.ipc.showallsockets does not.
- Check kern.security.seeotheruids_permitted in cr_cansee().
- Replace visibility calls to socheckuid() with cr_cansee() (retain
  the change to socheckuid() in ipfw, where it is used for rule-matching).
- Remove prison_unpcb() and make use of cr_cansee() against the UNIX
  domain socket credential instead of comparing root vnodes for the
  UDS and the process.  This allows multiple jails to share the same
  chroot() and not see each others UNIX domain sockets.
- Remove unused socheckproc().

Now that cr_cansee() is used universally for socket visibility, a variety
of policies are more consistently enforced, including uid-based
restrictions and jail-based restrictions.  This also better-supports
the introduction of additional MAC models.

Reviewed by:	ps, billf
Obtained from:	TrustedBSD Project
2001-10-09 21:40:30 +00:00
Jayanth Vijayaraghavan
c24d5dae7a Add a flag TF_LASTIDLE, that forces a previously idle connection
to send all its data, especially when the data is less than one MSS.
This fixes an issue where the stack was delaying the sending
of data, eventhough there was enough window to send all the data and
the sending of data was emptying the socket buffer.

Problem found by Yoshihiro Tsuchiya (tsuchiya@flab.fujitsu.co.jp)

Submitted by: Jayanth Vijayaraghavan
2001-10-05 21:33:38 +00:00
Paul Saab
4787fd37af Only allow users to see their own socket connections if
kern.ipc.showallsockets is set to 0.

Submitted by:	billf (with modifications by me)
Inspired by:	Dave McKay (aka pm aka Packet Magnet)
Reviewed by:	peter
MFC after:	2 weeks
2001-10-05 07:06:32 +00:00
Paul Saab
db69a05dce Make it so dummynet and bridge can be loaded as modules.
Submitted by:	billf
2001-10-05 05:45:27 +00:00
Jonathan Lemon
22c819a73a in_ifinit apparently can be used to rewrite an ip address; recalculate
the correct hash bucket for the entry.

Submitted by: iedowse  (with some munging by me)
2001-10-01 18:07:08 +00:00
Luigi Rizzo
cc33247e33 Fix a problem with unnumbered rules introduced in latest commit.
Reported by: des
2001-10-01 17:35:54 +00:00
Ruslan Ermilov
32eef9aeb1 mdoc(7) police: Use the new .In macro for #include statements. 2001-10-01 16:09:29 +00:00
Matthew Dillon
e2505aa676 Add __FBSDID's to libalias 2001-09-30 21:03:33 +00:00
Jonathan Lemon
01a5f19070 Nuke unused (and incorrect) #define of INADDR_HMASK.
Spotted by: ru
2001-09-29 14:59:20 +00:00
Jonathan Lemon
a931d7ed29 Make the INADDR_TO_IFP macro use the IP address hash lookup instead of
walking the entire list of IP addresses.

Pointed out by: bfumerola
2001-09-29 06:16:02 +00:00
Jonathan Lemon
ca925d9c17 Add a hash table that contains the list of internet addresses, and use
this in place of the in_ifaddr list when appropriate.  This improves
performance on hosts which have a large number of IP aliases.
2001-09-29 04:34:11 +00:00
Jonathan Lemon
9a10980e2a Centralize satosin(), sintosa() and ifatoia() macros in <netinet/in.h>
Remove local definitions.
2001-09-29 03:23:44 +00:00
Luigi Rizzo
830cc17841 Two main changes here:
+ implement "limit" rules, which permit to limit the number of sessions
   between certain host pairs (according to masks). These are a special
   type of stateful rules, which might be of interest in some cases.
   See the ipfw manpage for details.

 + merge the list pointers and ipfw rule descriptors in the kernel, so
   the code is smaller, faster and more readable. This patch basically
   consists in replacing "foo->rule->bar" with "rule->bar" all over
   the place.
   I have been willing to do this for ages!

MFC after: 1 week
2001-09-27 23:44:27 +00:00
Luigi Rizzo
832d32eb5b Remove unused (and duplicate) struct ip_opts which is never used,
not referenced in Stevens, and does not compile with g++.
There is an equivalent structure, struct ipoption in ip_var.h
which is actually used in various parts of the kernel, and also referenced
in Stevens.

Bill Fenner also says:
... if you want the trivia, struct ip_opts was introduced
in in.h SCCS revision 7.9, on 6/28/1990, by Mike Karels.
struct ipoption was introduced in ip_var.h SCCS revision 6.5,
on 9/16/1985, by... Mike Karels.

MFC-after: 3 days
2001-09-27 11:53:22 +00:00
Brooks Davis
49c024e373 Include sys/proc.h for the definition of securelevel_ge().
Submitted by:	LINT
2001-09-26 21:53:20 +00:00
Robert Watson
785f9ffca3 o Modify IPFW and DUMMYNET administrative setsockopt() calls to use
securelevel_gt() to check the securelevel, rather than direct access
  to the securelevel variable.

Obtained from:	TrustedBSD Project
2001-09-26 19:58:29 +00:00
Brooks Davis
9494d5968f Make faith loadable, unloadable, and clonable. 2001-09-25 18:40:52 +00:00
Luigi Rizzo
078156d09d Fix a null pointer dereference introduced in the last commit, plus
remove a useless assignment and move a comment.

Submitted by: Thomas Moestl
2001-09-24 05:24:19 +00:00
Ruslan Ermilov
c1dd00f75c Fixed the bug that prevented communication with FTP servers behind
NAT in extended passive mode if the server's public IP address was
different from the main NAT address.  This caused a wrong aliasing
link to be created that did not route the incoming packets back to
the original IP address of the server.

	natd -v -n pub0 -redirect_address localFTP publicFTP

Note that even if localFTP == publicFTP, one still needs to supply
the -redirect_address directive.  It is needed as a helper because
extended passive mode's 229 reply does not contain the IP address.

MFC after:	1 week
2001-09-21 14:38:36 +00:00
Robert Watson
94088977c9 o Rename u_cansee() to cr_cansee(), making the name more comprehensible
in the face of a rename of ucred to cred, and possibly generally.

Obtained from:	TrustedBSD Project
2001-09-20 21:45:31 +00:00
Luigi Rizzo
32f967a3c0 A bunch of minor changes to the code (see below) for readability, code size
and speed. No new functionality added (yet) apart from a bugfix.
MFC will occur in due time and probably in stages.

BUGFIX: fix a problem in old code which prevented reallocation of
the hash table for dynamic rules (there is a PR on this).

OTHER CHANGES: minor changes to the internal struct for static and dynamic rules.
Requires rebuild of ipfw binary.

Add comments to show how data structures are linked together.
(It probably makes no sense to keep the chain pointers separate
from actual rule descriptors. They will be hopefully merged soon.

keep a (sysctl-readable) counter for the number of static rules,
to speed up IP_FW_GET operations

initial support for a "grace time" for expired connections, so we
can set timeouts for closing connections to much shorter times.

merge zero_entry() and resetlog_entry(), they use basically the
same code.

clean up and reduce replication of code for removing rules,
both for readability and code size.

introduce a separate lifetime for dynamic UDP rules.

fix a problem in old code which prevented reallocation of
the hash table for dynamic rules (PR ...)

restructure dynamic rule descriptors

introduce some local variables to avoid multiple dereferencing of
pointer chains (reduces code size and hopefully increases speed).
2001-09-20 13:52:49 +00:00
Munechika SUMIKAWA
862e52ea61 Fixed comment: ipip_input -> mroute_encapcheck.
Reported by:	bde
2001-09-20 07:59:45 +00:00
Munechika SUMIKAWA
33ae84b7c6 Removed ipip_input(). No codes calls it anymore due to ip_encap.c's
encapsulation support.
2001-09-18 14:52:20 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
Julian Elischer
aa1489d4fa Remove some un-needed code that was accidentally included in
the 2nd previous KAME patch.

Submitted by:	SUMIKAWA Munechika <sumikawa@ebina.hitachi.co.jp>
2001-09-07 07:24:28 +00:00
Julian Elischer
ff265614c1 Patches from KAME to remove usage of Varargs in existing
IPV4 code. For now they will still have some in the developing stuff (IPv6)

Submitted by:	Keiichi SHIMA / <keiichi@iij.ad.jp>
Obtained from:	KAME
2001-09-07 07:19:12 +00:00
Jonathan Lemon
f9132cebdc Wrap array accesses in macros, which also happen to be lvalues:
ifnet_addrs[i - 1]  -> ifaddr_byindex(i)
        ifindex2ifnet[i]    -> ifnet_byindex(i)

This is intended to ease the conversion to SMPng.
2001-09-06 02:40:43 +00:00
Alfred Perlstein
75ce322136 Fix sysctl comment field, s/the the/then the
Pointed out by: ru
2001-09-04 15:25:23 +00:00
Alfred Perlstein
e3d123d63d Allow disabling of "arp moved" messages.
Submitted by: Stephen Hurd <deuce@lordlegacy.org>
2001-09-03 21:53:15 +00:00
Julian Elischer
4d2c57188f I really hope this is the right answer.
call ip_input directly but take the offset off the
packet first if it's an IPV4 packet encapsulated.
2001-09-03 21:07:31 +00:00
Julian Elischer
7dd66b4ad8 Call ip_input() instead of ipip_input()
when decoding encapsulated ipv4 packets.
(allows line to compile again)
2001-09-03 20:55:35 +00:00
Julian Elischer
2e4f1ee934 One caller of rip_input failed to be converted in the last commit. 2001-09-03 20:40:35 +00:00
Julian Elischer
f0ffb944d2 Patches from Keiichi SHIMA <keiichi@iij.ad.jp>
to make ip use the standard protosw structure again.

Obtained from: Well, KAME I guess.
2001-09-03 20:03:55 +00:00
Jayanth Vijayaraghavan
e7e2b80184 when newreno is turned on, if dupacks = 1 or dupacks = 2 and
new data is acknowledged, reset the dupacks to 0.
The problem was spotted when a connection had its send buffer full
because the congestion window was only 1 MSS and was not being incremented
because dupacks was not reset to 0.

Obtained from:		Yahoo!
2001-08-29 23:54:13 +00:00
Jesper Skriver
3b8123b72c When net.inet.tcp.icmp_may_rst is enabled, report ECONNREFUSED not ENETRESET
to the application as a RST would, this way we're compatible with the most
applications.

MFC candidate.

Submitted by:	Scott Renfro <scott@renfro.org>
Reviewed by:	Mike Silbersack <silby@silby.com>
2001-08-27 22:10:07 +00:00
Bill Fumerola
52cf11d8a1 the IP_FW_GET code in ip_fw_ctl() sizes a buffer to hold information
about rules and dynamic rules. it later fills this buffer with these
rules.

it also takes the opporunity to compare the expiration of the dynamic
rules with the current time and either marks them for deletion or simply
charges the countdown.

unfortunatly it does this all (the sizing, the buffer copying, and the
expiration GC) with no spl protection whatsoever. it was possible for
the dynamic rule(s) to be ripped out from under the request before it
had completed, resulting in corrupt memory dereferencing.

Reviewed by:	ps
MFC before:	4.4-RELEASE, hopefully.
2001-08-26 10:09:47 +00:00
Dima Dorfman
745bab7f84 Correct a typo in a comment: FIN_WAIT2 -> FIN_WAIT_2
PR:		29970
Submitted by:	Joseph Mallett <jmallett@xMach.org>
2001-08-23 22:34:29 +00:00
Mike Silbersack
b0e3ad758b Much delayed but now present: RFC 1948 style sequence numbers
In order to ensure security and functionality, RFC 1948 style
initial sequence number generation has been implemented.  Barring
any major crypographic breakthroughs, this algorithm should be
unbreakable.  In addition, the problems with TIME_WAIT recycling
which affect our currently used algorithm are not present.

Reviewed by: jesper
2001-08-22 00:58:16 +00:00
Ruslan Ermilov
d86293dbea Added TFTP support.
Submitted by:	Joe Clarke <marcus@marcuscom.com>
MFC after:	2 weeks
2001-08-21 16:25:38 +00:00
Ruslan Ermilov
04c3e33949 Close the "IRC DCC" security breach reported recently on Bugtraq.
Submitted by:	Makoto MATSUSHITA <matusita@jp.FreeBSD.org>
2001-08-21 11:21:08 +00:00
Brian Somers
f68e0a68d8 Make the copyright consistent.
Previously approved by:	Charles Mott <cmott@scientech.com>
2001-08-20 22:57:33 +00:00
Brian Somers
7806546c39 Handle snprintf() returning -1
MFC after: 2 weeks
2001-08-20 12:06:42 +00:00
Julian Elischer
2b6a0c4fcd Make the protoswitch definitiosn checkable in the same way that
cdevsw entries have been for a long time.
Discover that we now have two version sof the same structure.
I will shoot one of them shortly when I figure out why someone thinks
they need it. (And I can prove they don't)
(netinet/ipprotosw.h should GO AWAY)
2001-08-10 23:17:22 +00:00
Ruslan Ermilov
c4d9468ea0 mdoc(7) police:
Avoid using parenthesis enclosure macros (.Pq and .Po/.Pc) with plain text.
Not only this slows down the mdoc(7) processing significantly, but it also
has an undesired (in this case) effect of disabling hyphenation within the
entire enclosed block.
2001-08-07 15:48:51 +00:00
Hajimu UMEMOTO
e43cc4ae36 When running aplication joined multicast address,
removing network card, and kill aplication.
imo_membership[].inm_ifp refer interface pointer
after removing interface.
When kill aplication, release socket,and imo_membership.
imo_membership use already not exist interface pointer.
Then, kernel panic.

PR:		29345
Submitted by:	Inoue Yuichi <inoue@nd.net.fujitsu.co.jp>
Obtained from:	KAME
MFC after:	3 days
2001-08-04 17:10:14 +00:00
Daniel C. Sobral
07203494d2 MFS: Avoid dropping fragments in the absence of an interface address.
Noticed by:	fenner
Submitted by:	iedowse
Not committed to current by:	iedowse ;-)
2001-08-03 17:36:06 +00:00
Peter Wemm
57e119f6f2 Fix a warning. 2001-07-27 00:04:39 +00:00
Peter Wemm
016517247f Patch up some style(9) stuff in tcp_new_isn() 2001-07-27 00:03:49 +00:00
Peter Wemm
92971bd3f1 s/OpemBSD/OpenBSD/ 2001-07-27 00:01:48 +00:00
Hajimu UMEMOTO
13cf67f317 move ipsec security policy allocation into in_pcballoc, before
making pcbs available to the outside world.  otherwise, we will see
inpcb without ipsec security policy attached (-> panic() in ipsec.c).

Obtained from:	KAME
MFC after:	3 days
2001-07-26 19:19:49 +00:00
Bill Fenner
3f2e902a15 Somewhat modernize ip_mroute.c:
- Use sysctl to export stats
- Use ip_encap.c's encapsulation support
- Update lkm to kld (is 6 years a record for a broken module?)
- Remove some unused cruft
2001-07-25 20:15:49 +00:00
Ruslan Ermilov
38c1bc358b Avoid a NULL pointer derefence introduced in rev. 1.129.
Problem noticed by:	bde, gcc(1)
Panic caught by:	mjacob
Patch tested by:	mjacob
2001-07-23 16:50:01 +00:00
Ruslan Ermilov
f2c2962ee5 Backout non-functional changes from revision 1.128.
Not objected to by:	dcs
2001-07-19 07:10:30 +00:00
Daniel C. Sobral
3afefa3924 Skip the route checking in the case of multicast packets with known
interfaces.

Reviewed by:	people at that channel
Approved by:	silence on -net
2001-07-17 18:47:48 +00:00
Ruslan Ermilov
9f81cc840b Backout damage to the INADDR_TO_IFP() macro in revision 1.7.
This macro was supposed to only match local IP addresses of
interfaces, and all consumers of this macro assume this as
well.  (See IP_MULTICAST_IF and IP_ADD_MEMBERSHIP socket
options in the ip(4) manpage.)

This fixes a major security breach in IPFW-based firewalls
where the `me' keyword would match the other end of a P2P
link.

PR:		kern/28567
2001-07-17 10:30:21 +00:00
David E. O'Brien
81e561cdf2 Bump net.inet.tcp.sendspace to 32k and net.inet.tcp.recvspace to 65k.
This should help us in nieve benchmark "tests".

It seems a wide number of people think 32k buffers would not cause major
issues, and is in fact in use by many other OS's at this time.  The
receive buffers can be bumped higher as buffers are hardly used and several
research papers indicate that receive buffers rarely use much space at all.

Submitted by:			Leo Bicknell <bicknell@ufp.org>
				<20010713101107.B9559@ussenterprise.ufp.org>
Agreed to in principle by:	dillon (at the 32k level)
2001-07-13 18:38:04 +00:00
Ruslan Ermilov
a307d59838 mdoc(7) police: removed HISTORY info from the .Os call. 2001-07-10 13:41:46 +00:00
Mike Silbersack
2d610a5028 Temporary feature: Runtime tuneable tcp initial sequence number
generation scheme.  Users may now select between the currently used
OpenBSD algorithm and the older random positive increment method.

While the OpenBSD algorithm is more secure, it also breaks TIME_WAIT
handling; this is causing trouble for an increasing number of folks.

To switch between generation schemes, one sets the sysctl
net.inet.tcp.tcp_seq_genscheme.  0 = random positive increments,
1 = the OpenBSD algorithm.  1 is still the default.

Once a secure _and_ compatible algorithm is implemented, this sysctl
will be removed.

Reviewed by: jlemon
Tested by: numerous subscribers of -net
2001-07-08 02:20:47 +00:00
Brooks Davis
53dab5fe7b gif(4) and stf(4) modernization:
- Remove gif dependencies from stf.
 - Make gif and stf into modules
 - Make gif cloneable.

PR:		kern/27983
Reviewed by:	ru, ume
Obtained from:	NetBSD
MFC after:	1 week
2001-07-02 21:02:09 +00:00
Crist J. Clark
92a99815a8 While in there fixing a fragment logging bug, fix it so we log
fragments "right." Log fragment information tcpdump(8)-style,

   Jul  1 19:38:45 bubbles /boot/kernel/kernel: ipfw: 1000 Accept ICMP:8.0 192.168.64.60 192.168.64.20 in via ep0 (frag 53113:1480@0+)

That is, instead of the old,

  ... Fragment = <offset/8>

Do,

  ... (frag <IP ID>:<data len>@<offset>[+])

PR:		kern/23446
Approved by:	ru
MFC after:	1 week
2001-07-02 15:50:31 +00:00
Ruslan Ermilov
8bf82a92d5 Backout CSRG revision 7.22 to this file (if in_losing notices an
RTF_DYNAMIC route, it got freed twice).  I am not sure what was
the actual problem in 1992, but the current behavior is memory
leak if PCB holds a reference to a dynamically created/modified
routing table entry.  (rt_refcnt>0 and we don't call rtfree().)

My test bed was:

1.  Set net.inet.tcp.msl to a low value (for test purposes), e.g.,
    5 seconds, to speed up the transition of TCP connection to a
    "closed" state.
2.  Add a network route which causes ICMP redirect from the gateway.
3.  ping(8) host H that matches this route; this creates RTF_DYNAMIC
    RTF_HOST route to H.  (I was forced to use ICMP to cause gateway
    to generate ICMP host redirect, because gateway in question is a
    4.2-STABLE system vulnerable to a problem that was fixed later in
    ip_icmp.c,v 1.39.2.6, and TCP packets with DF bit set were
    triggering this bug.)
4.  telnet(1) to H
5.  Block access to H with ipfw(8)
6.  Send something in telnet(1) session; this causes EPERM, followed
    by an in_losing() call in a few seconds.
7.  Delete ipfw(8) rule blocking access to H, and wait for TCP
    connection moving to a CLOSED state; PCB is freed.
8.  Delete host route to H.
9.  Watch with netstat(1) that `rttrash' increased.
10. Repeat steps 3-9, and watch `rttrash' increases.

PR:		kern/25421
MFC after:	2 weeks
2001-06-29 12:07:29 +00:00
Ruslan Ermilov
3277d1c498 Fixed the brain-o in rev. 1.10: the logic check was reversed.
Reported by:	Bernd Fuerwitt <bf@fuerwitt.de>
2001-06-27 14:11:25 +00:00
Ruslan Ermilov
a447a5ae06 Bring in fix from NetBSD's revision 1.16:
Pass the correct destination address for the route-to-gateway case.

PR:		kern/10607
MFC after:	2 weeks
2001-06-26 09:00:50 +00:00
David Malone
7ce87f1205 Allow getcred sysctl to work in jailed root processes. Processes can
only do getcred calls for sockets which were created in the same jail.
This should allow the ident to work in a reasonable way within jails.

PR:		28107
Approved by:	des, rwatson
2001-06-24 12:18:27 +00:00
Jonathan Lemon
f962cba5c3 Replace bzero() of struct ip with explicit zeroing of structure members,
which is faster.
2001-06-23 17:44:27 +00:00
Ruslan Ermilov
c73d99b567 Add netstat(1) knob to reset net.inet.{ip|icmp|tcp|udp|igmp}.stats.
For example, ``netstat -s -p ip -z'' will show and reset IP stats.

PR:		bin/17338
2001-06-23 17:17:59 +00:00
Mike Silbersack
08517d530e Eliminate the allocation of a tcp template structure for each
connection.  The information contained in a tcptemp can be
reconstructed from a tcpcb when needed.

Previously, tcp templates required the allocation of one
mbuf per connection.  On large systems, this change should
free up a large number of mbufs.

Reviewed by:	bmilekic, jlemon, ru
MFC after: 2 weeks
2001-06-23 03:21:46 +00:00
Munechika SUMIKAWA
a96c00661a - Renumber KAME local ICMP types and NDP options numberes beacaues they
are duplicated by newly defined types/options in RFC3121
- We have no backward compatibility issue. There is no apps in our
  distribution which use the above types/options.

Obtained from:	KAME
MFC after:	2 weeks
2001-06-21 07:08:43 +00:00
Hajimu UMEMOTO
ff2428299f made sure to use the correct sa_len for rtalloc().
sizeof(ro_dst) is not necessarily the correct one.
this change would also fix the recent path MTU discovery problem for the
destination of an incoming TCP connection.

Submitted by:	JINMEI Tatuya <jinmei@kame.net>
Obtained from:	KAME
MFC after:	2 weeks
2001-06-20 12:32:48 +00:00
Jonathan Lemon
08aadfbb98 Do not perform arp send/resolve on an interface marked NOARP.
PR: 25006
MFC after: 2 weeks
2001-06-15 21:00:32 +00:00
Peter Wemm
215db1379e Fix a stack of KAME netinet6/in6.h warnings:
592: warning: `struct mbuf' declared inside parameter list
595: warning: `struct ifnet' declared inside parameter list
2001-06-15 00:37:27 +00:00
Hajimu UMEMOTO
3384154590 Sync with recent KAME.
This work was based on kame-20010528-freebsd43-snap.tgz and some
critical problem after the snap was out were fixed.
There are many many changes since last KAME merge.

TODO:
  - The definitions of SADB_* in sys/net/pfkeyv2.h are still different
    from RFC2407/IANA assignment because of binary compatibility
    issue.  It should be fixed under 5-CURRENT.
  - ip6po_m member of struct ip6_pktopts is no longer used.  But, it
    is still there because of binary compatibility issue.  It should
    be removed under 5-CURRENT.

Reviewed by:	itojun
Obtained from:	KAME
MFC after:	3 weeks
2001-06-11 12:39:29 +00:00
Jesper Skriver
96c2b04290 Make the default value of net.inet.ip.maxfragpackets and
net.inet6.ip6.maxfragpackets dependent on nmbclusters,
defaulting to nmbclusters / 4

Reviewed by:	bde
MFC after:	1 week
2001-06-10 11:04:10 +00:00
Peter Wemm
0978669829 "Fix" the previous initial attempt at fixing TUNABLE_INT(). This time
around, use a common function for looking up and extracting the tunables
from the kernel environment.  This saves duplicating the same function
over and over again.  This way typically has an overhead of 8 bytes + the
path string, versus about 26 bytes + the path string.
2001-06-08 05:24:21 +00:00
Jonathan Lemon
0a52f59c36 Move IPFilter into contrib. 2001-06-07 05:13:35 +00:00
Peter Wemm
4422746fdf Back out part of my previous commit. This was a last minute change
and I botched testing.  This is a perfect example of how NOT to do
this sort of thing. :-(
2001-06-07 03:17:26 +00:00
Peter Wemm
81930014ef Make the TUNABLE_*() macros look and behave more consistantly like the
SYSCTL_*() macros.  TUNABLE_INT_DECL() was an odd name because it didn't
actually declare the int, which is what the name suggests it would do.
2001-06-06 22:17:08 +00:00
Jesper Skriver
65f28919b3 Silby's take one on increasing FreeBSD's resistance to SYN floods:
One way we can reduce the amount of traffic we send in response to a SYN
flood is to eliminate the RST we send when removing a connection from
the listen queue.  Since we are being flooded, we can assume that the
majority of connections in the queue are bogus.  Our RST is unwanted
by these hosts, just as our SYN-ACK was.  Genuine connection attempts
will result in hosts responding to our SYN-ACK with an ACK packet.  We
will automatically return a RST response to their ACK when it gets to us
if the connection has been dropped, so the early RST doesn't serve the
genuine class of connections much.  In summary, we can reduce the number
of packets we send by a factor of two without any loss in functionality
by ensuring that RST packets are not sent when dropping a connection
from the listen queue.

Submitted by:	Mike Silbersack <silby@silby.com>
Reviewed by:	jesper
MFC after:	2 weeks
2001-06-06 19:41:51 +00:00
Brian Somers
f987e1bd0f Add BSD-style copyright headers
Approved by: Charles Mott <cmott@scientech.com>
2001-06-04 15:09:51 +00:00
Brian Somers
888b1a7aa5 Change to a standard BSD-style copyright
Approved by:	Atsushi Murai <amurai@spec.co.jp>
2001-06-04 14:52:17 +00:00
Jesper Skriver
690a6055ff Prevent denial of service using bogus fragmented IPv4 packets.
A attacker sending a lot of bogus fragmented packets to the target
(with different IPv4 identification field - ip_id), may be able
to put the target machine into mbuf starvation state.

By setting a upper limit on the number of reassembly queues we
prevent this situation.

This upper limit is controlled by the new sysctl
net.inet.ip.maxfragpackets which defaults to 200,
as the IPv6 case, this should be sufficient for most
systmes, but you might want to increase it if you have
lots of TCP sessions.
I'm working on making the default value dependent on
nmbclusters.

If you want old behaviour (no upper limit) set this sysctl
to a negative value.

If you don't want to accept any fragments (not recommended)
set the sysctl to 0 (zero).

Obtained from:	NetBSD
MFC after:	1 week
2001-06-03 23:33:23 +00:00
Kris Kennaway
64dddc1872 Add ``options RANDOM_IP_ID'' which randomizes the ID field of IP packets.
This closes a minor information leak which allows a remote observer to
determine the rate at which the machine is generating packets, since the
default behaviour is to increment a counter for each packet sent.

Reviewed by:    -net
Obtained from:  OpenBSD
2001-06-01 10:02:28 +00:00
David E. O'Brien
240ef84277 Back out jesper's 2001/05/31 14:58:11 PDT commit. It does not compile. 2001-06-01 09:51:14 +00:00
Jesper Skriver
2b1a209a17 Prevent denial of service using bogus fragmented IPv4 packets.
A attacker sending a lot of bogus fragmented packets to the target
(with different IPv4 identification field - ip_id), may be able
to put the target machine into mbuf starvation state.

By setting a upper limit on the number of reassembly queues we
prevent this situation.

This upper limit is controlled by the new sysctl
net.inet.ip.maxfragpackets which defaults to NMBCLUSTERS/4

If you want old behaviour (no upper limit) set this sysctl
to a negative value.

If you don't want to accept any fragments (not recommended)
set the sysctl to 0 (zero)

Obtained from:	NetBSD (partially)
MFC after:	1 week
2001-05-31 21:57:29 +00:00
Jesper Skriver
7ceb778366 Disable rfc1323 and rfc1644 TCP extensions if we havn't got
any response to our third SYN to work-around some broken
terminal servers (most of which have hopefully been retired)
that have bad VJ header compression code which trashes TCP
segments containing unknown-to-them TCP options.

PR:		kern/1689
Submitted by:	jesper
Reviewed by:	wollman
MFC after:	2 weeks
2001-05-31 19:24:49 +00:00
Ruslan Ermilov
79ec1c507a Add an integer field to keep protocol-specific flags with links.
For FTP control connection, keep the CRLF end-of-line termination
status in there.

Fixed the bug when the first FTP command in a session was ignored.

PR:		24048
MFC after:	1 week
2001-05-30 14:24:35 +00:00
Jesper Skriver
e4b6428171 Inline TCP_REASS() in the single location where it's used,
just as OpenBSD and NetBSD has done.

No functional difference.

MFC after:	2 weeks
2001-05-29 19:54:45 +00:00
Jesper Skriver
853be1226e properly delay acks in half-closed TCP connections
PR:	24962
Submitted by:	Tony Finch <dot@dotat.at>
MFC after:	2 weeks
2001-05-29 19:51:45 +00:00
Ruslan Ermilov
9185426827 In in_ifadown(), differentiate between whether the interface goes
down or interface address is deleted.  Only delete static routes
in the latter case.

Reported by:	Alexander Leidinger <Alexander@leidinger.net>
2001-05-11 14:37:34 +00:00
Mark Murray
fb919e4d5a Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by:	bde (with reservations)
2001-05-01 08:13:21 +00:00
Jesper Skriver
d1745f454d Say goodbye to TCP_COMPAT_42
Reviewed by:	wollman
Requested by:	wollman
2001-04-20 11:58:56 +00:00
Kris Kennaway
f0a04f3f51 Randomize the TCP initial sequence numbers more thoroughly.
Obtained from:	OpenBSD
Reviewed by:	jesper, peter, -developers
2001-04-17 18:08:01 +00:00
Darren Reed
454a43c1f1 fix security hole created by fragment cache 2001-04-06 15:52:28 +00:00
Bill Fumerola
0901f62e11 pipe/queue are the only consumers of flow_id, so only set it in those cases 2001-04-06 06:52:25 +00:00
Jesper Skriver
b77d155dd3 MFC candidate.
Change code from PRC_UNREACH_ADMIN_PROHIB to PRC_UNREACH_PORT for
ICMP_UNREACH_PROTOCOL and ICMP_UNREACH_PORT

And let TCP treat PRC_UNREACH_PORT like PRC_UNREACH_ADMIN_PROHIB

This should fix the case where port unreachables for udp returned
ENETRESET instead of ECONNREFUSED

Problem found by:	Bill Fenner <fenner@research.att.com>
Reviewed by:		jlemon
2001-03-28 14:13:19 +00:00
Ruslan Ermilov
4a558355e5 MAN[1-9] -> MAN. 2001-03-27 17:27:19 +00:00
Yaroslav Tykhiy
4cbc8ad1bb Add a missing m_pullup() before a mtod() in in_arpinput().
PR: kern/22177
Reviewed by: wollman
2001-03-27 12:34:58 +00:00
Hidetoshi Shimokawa
110a013333 Replace dyn_fin_lifetime with dyn_ack_lifetime for half-closed state.
Half-closed state could last long for some connections and fin_lifetime
(default 20sec) is too short for that.

OK'ed by: luigi
2001-03-27 05:28:30 +00:00
Poul-Henning Kamp
f83880518b Send the remains (such as I have located) of "block major numbers" to
the bit-bucket.
2001-03-26 12:41:29 +00:00
Brian Somers
71593f95e0 Make header files conform to style(9).
Reviewed by (*): bde

(*) alias_local.h only got a cursory glance.
2001-03-25 12:05:10 +00:00
Brian Somers
adad9908fa Remove an extraneous declaration. 2001-03-25 03:34:29 +00:00
Hajimu UMEMOTO
2da24fa6e9 IPv4 address is not unsigned int. This change introduces in_addr_t.
PR:		9982
Adviced by:	des
Reviewed by:	-alpha and -net (no objection)
Obtained from:	OpenBSD
2001-03-23 18:59:31 +00:00
Brian Somers
30fcf11451 Remove (non-protected) variable names from function prototypes. 2001-03-22 11:55:26 +00:00
Paul Richards
1789d85615 Only flush rules that have a rule number above that set by a new
sysctl, net.inet.ip.fw.permanent_rules.

This allows you to install rules that are persistent across flushes,
which is very useful if you want a default set of rules that
maintains your access to remote machines while you're reconfiguring
the other rules.

Reviewed by:	Mark Murray <markm@FreeBSD.org>
2001-03-21 08:19:31 +00:00
Dag-Erling Smørgrav
c59319bf1a Axe TCP_RESTRICT_RST. It was never a particularly good idea except for a few
very specific scenarios, and now that we have had net.inet.tcp.blackhole for
quite some time there is really no reason to use it any more.

(last of three commits)
2001-03-19 22:09:00 +00:00
Ruslan Ermilov
1e3d5af041 Invalidate cached forwarding route (ipforward_rt) whenever a new route
is added to the routing table, otherwise we may end up using the wrong
route when forwarding.

PR:		kern/10778
Reviewed by:	silence on -net
2001-03-19 09:16:16 +00:00
Ruslan Ermilov
4078ffb154 Make sure the cached forwarding route (ipforward_rt) is still up before
using it.  Not checking this may have caused the wrong IP address to be
used when processing certain IP options (see example below).  This also
caused the wrong route to be passed to ip_output() when forwarding, but
fortunately ip_output() is smart enough to detect this.

This example demonstrates the wrong behavior of the Record Route option
observed with this bug.  Host ``freebsd'' is acting as the gateway for
the ``sysv''.

1. On the gateway, we add the route to the destination.  The new route
   will use the primary address of the loopback interface, 127.0.0.1:

:  freebsd# route add 10.0.0.66 -iface lo0 -reject
:  add host 10.0.0.66: gateway lo0

2. From the client, we ping the destination.  We see the correct replies.
   Please note that this also causes the relevant route on the ``freebsd''
   gateway to be cached in ipforward_rt variable:

:  sysv# ping -snv 10.0.0.66
:  PING 10.0.0.66: 56 data bytes
:  ICMP Host Unreachable from gateway 192.168.0.115
:  ICMP Host Unreachable from gateway 192.168.0.115
:  ICMP Host Unreachable from gateway 192.168.0.115
:
:  ----10.0.0.66 PING Statistics----
:  3 packets transmitted, 0 packets received, 100% packet loss

3. On the gateway, we delete the route to the destination, thus making
   the destination reachable through the `default' route:

:  freebsd# route delete 10.0.0.66
:  delete host 10.0.0.66

4. From the client, we ping destination again, now with the RR option
   turned on.  The surprise here is the 127.0.0.1 in the first reply.
   This is caused by the bug in ip_rtaddr() not checking the cached
   route is still up befor use.  The debug code also shows that the
   wrong (down) route is further passed to ip_output().  The latter
   detects that the route is down, and replaces the bogus route with
   the valid one, so we see the correct replies (192.168.0.115) on
   further probes:

:  sysv# ping -snRv 10.0.0.66
:  PING 10.0.0.66: 56 data bytes
:  64 bytes from 10.0.0.66: icmp_seq=0. time=10. ms
:    IP options:  <record route> 127.0.0.1, 10.0.0.65, 10.0.0.66,
:                                192.168.0.65, 192.168.0.115, 192.168.0.120,
:                                0.0.0.0(Current), 0.0.0.0, 0.0.0.0
:  64 bytes from 10.0.0.66: icmp_seq=1. time=0. ms
:    IP options:  <record route> 192.168.0.115, 10.0.0.65, 10.0.0.66,
:                                192.168.0.65, 192.168.0.115, 192.168.0.120,
:                                0.0.0.0(Current), 0.0.0.0, 0.0.0.0
:  64 bytes from 10.0.0.66: icmp_seq=2. time=0. ms
:    IP options:  <record route> 192.168.0.115, 10.0.0.65, 10.0.0.66,
:                                192.168.0.65, 192.168.0.115, 192.168.0.120,
:                                0.0.0.0(Current), 0.0.0.0, 0.0.0.0
:
:  ----10.0.0.66 PING Statistics----
:  3 packets transmitted, 3 packets received, 0% packet loss
:  round-trip (ms)  min/avg/max = 0/3/10
2001-03-18 13:04:07 +00:00
Poul-Henning Kamp
462b86fe91 <sys/queue.h> makeover. 2001-03-16 20:00:53 +00:00
Poul-Henning Kamp
ccd6f42dc9 Fix a style(9) nit. 2001-03-16 19:36:23 +00:00
Ruslan Ermilov
089cdfad78 net/route.c:
A route generated from an RTF_CLONING route had the RTF_WASCLONED flag
  set but did not have a reference to the parent route, as documented in
  the rtentry(9) manpage.  This prevented such routes from being deleted
  when their parent route is deleted.

  Now, for example, if you delete an IP address from a network interface,
  all ARP entries that were cloned from this interface route are flushed.

  This also has an impact on netstat(1) output.  Previously, dynamically
  created ARP cache entries (RTF_STATIC flag is unset) were displayed as
  part of the routing table display (-r).  Now, they are only printed if
  the -a option is given.

netinet/in.c, netinet/in_rmx.c:

  When address is removed from an interface, also delete all routes that
  point to this interface and address.  Previously, for example, if you
  changed the address on an interface, outgoing IP datagrams might still
  use the old address.  The only solution was to delete and re-add some
  routes.  (The problem is easily observed with the route(8) command.)

  Note, that if the socket was already bound to the local address before
  this address is removed, new datagrams generated from this socket will
  still be sent from the old address.

PR:		kern/20785, kern/21914
Reviewed by:	wollman (the idea)
2001-03-15 14:52:12 +00:00
Ruslan Ermilov
206a3274ef RFC768 (UDP) requires that "if the computed checksum is zero, it
is transmitted as all ones".  This got broken after introduction
of delayed checksums as follows.  Some guys (including Jonathan)
think that it is allowed to transmit all ones in place of a zero
checksum for TCP the same way as for UDP.  (The discussion still
takes place on -net.)  Thus, the 0 -> 0xffff checksum fixup was
first moved from udp_output() (see udp_usrreq.c, 1.64 -> 1.65)
to in_cksum_skip() (see sys/i386/i386/in_cksum.c, 1.17 -> 1.18,
INVERT expression).  Besides that I disagree that it is valid for
TCP, there was no real problem until in_cksum.c,v 1.20, where the
in_cksum() was made just a special version of in_cksum_skip().
The side effect was that now every incoming IP datagram failed to
pass the checksum test (in_cksum() returned 0xffff when it should
actually return zero).  It was fixed next day in revision 1.21,
by removing the INVERT expression.  The latter also broke the
0 -> 0xffff fixup for UDP checksums.

Before this change:
: tcpdump: listening on lo0
: 127.0.0.1.33005 > 127.0.0.1.33006:  udp 0 (ttl 64, id 1)
:                          4500 001c 0001 0000 4011 7cce 7f00 0001
:                          7f00 0001 80ed 80ee 0008 0000

After this change:
: tcpdump: listening on lo0
: 127.0.0.1.33005 > 127.0.0.1.33006:  udp 0 (ttl 64, id 1)
:                          4500 001c 0001 0000 4011 7cce 7f00 0001
:                          7f00 0001 80ed 80ee 0008 ffff
2001-03-13 17:07:06 +00:00
Ruslan Ermilov
fb9aaba000 Count and show incoming UDP datagrams with no checksum. 2001-03-13 13:26:06 +00:00
Poul-Henning Kamp
503d3c0277 Correctly cleanup in case of failure to bind a pcb.
PR:		25751
Submitted by:	<unicorn@Forest.Od.UA>
2001-03-12 21:53:23 +00:00
Jonathan Lemon
1db24ffb98 Unbreak LINT.
Pointed out by: phk
2001-03-12 02:57:42 +00:00
Ian Dowse
5d936aa181 In ip_output(), initialise `ia' in the case where the packet has
come from a dummynet pipe. Without this, the code which increments
the per-ifaddr stats can dereference an uninitialised pointer. This
should make dummynet usable again.

Reported by:	"Dmitry A. Yanko" <fm@astral.ntu-kpi.kiev.ua>
Reviewed by:	luigi, joe
2001-03-11 17:50:19 +00:00
Ruslan Ermilov
8ce3f3dd28 Make it possible to use IP_TTL and IP_TOS setsockopt(2) options
on certain types of SOCK_RAW sockets.  Also, use the ip.ttl MIB
variable instead of MAXTTL constant as the default time-to-live
value for outgoing IP packets all over the place, as we already
do this for TCP and UDP.

Reviewed by:	wollman
2001-03-09 12:22:51 +00:00
Jonathan Lemon
c0647e0d07 Push the test for a disconnected socket when accept()ing down to the
protocol layer.  Not all protocols behave identically.  This fixes the
brokenness observed with unix-domain sockets (and postfix)
2001-03-09 08:16:40 +00:00
Jonathan Lemon
32676c2d1f The TCP sequence number used for sending a RST with the ipfw reset rule
is already in host byte order, so do not swap it again.

Reviewed by:	bfumerola
2001-03-09 08:13:08 +00:00
Ian Dowse
bfef7ed45c It was possible for ip_forward() to supply to icmp_error()
an IP header with ip_len in network byte order. For certain
values of ip_len, this could cause icmp_error() to write
beyond the end of an mbuf, causing mbuf free-list corruption.
This problem was observed during generation of ICMP redirects.

We now make quite sure that the copy of the IP header kept
for icmp_error() is stored in a non-shared mbuf header so
that it will not be modified by ip_output().

Also:
- Calculate the correct number of bytes that need to be
  retained for icmp_error(), instead of assuming that 64
  is enough (it's not).
- In icmp_error(), use m_copydata instead of bcopy() to
  copy from the supplied mbuf chain, in case the first 8
  bytes of IP payload are not stored directly after the IP
  header.
- Sanity-check ip_len in icmp_error(), and panic if it is
  less than sizeof(struct ip). Incoming packets with bad
  ip_len values are discarded in ip_input(), so this should
  only be triggered by bugs in the code, not by bad packets.

This patch results from code and suggestions from Ruslan, Bosko,
Jonathan Lemon and Matt Dillon, with important testing by Mike
Tancsa, who could reproduce this problem at will.

Reported by:	Mike Tancsa <mike@sentex.net>
Reviewed by:	ru, bmilekic, jlemon, dillon
2001-03-08 19:03:26 +00:00
Don Lewis
a8f1210095 Modify the comments to more closely resemble the English language. 2001-03-05 22:40:27 +00:00
Don Lewis
3f67c83439 Move the loopback net check closer to the beginning of ip_input() so that
it doesn't block packets whose destination address has been translated to
the loopback net by ipnat.

Add warning comments about the ip_checkinterface feature.
2001-03-05 08:45:05 +00:00
Bosko Milekic
234ff7c46f During a flood, we don't call rtfree(), but we remove the entry ourselves.
However, if the RTF_DELCLONE and RTF_WASCLONED condition passes, but the ref
count is > 1, we won't decrement the count at all. This could lead to
route entries never being deleted.

Here, we call rtfree() not only if the initial two conditions fail, but
also if the ref count is > 1 (and we therefore don't immediately delete
the route, but let rtfree() handle it).

This is an urgent MFC candidate. Thanks go to Mike Silbersack for the
fix, once again. :-)

Submitted by: Mike Silbersack <silby@silby.com>
2001-03-04 21:28:40 +00:00
Don Lewis
e15ae1b226 Disable interface checking for packets subject to "ipfw fwd".
Chris Johnson <cjohnson@palomine.net> tested this fix in -stable.
2001-03-04 03:22:36 +00:00
Don Lewis
823db0e9dd Disable interface checking when IP forwarding is engaged so that packets
addressed to the interface on the other side of the box follow their
historical path.

Explicitly block packets sent to the loopback network sent from the outside,
which is consistent with the behavior of the forwarding path between
interfaces as implemented in in_canforward().

Always check the arrival interface when matching the packet destination
against the interface broadcast addresses.  This bug allowed TCP
connections to be made to the broadcast address of an interface on the
far side of the system because the M_BCAST flag was not set because the
packet was unicast to the interface on the near side.  This was broken
when the directed broadcast code was removed from revision 1.32.  If
the directed broadcast code was stil present, the destination would not
have been recognized as local until the packet was forwarded to the output
interface and ether_output() looped a copy back to ip_input() with
M_BCAST set and the receive interface set to the output interface.

Optimize the order of the tests.

Reviewed by:	jlemon
2001-03-04 01:39:19 +00:00
Jonathan Lemon
b3e95d4ed0 Add a new sysctl net.inet.ip.check_interface, which will verify that
an incoming packet arrivees on an interface that has an address matching
the packet's address.  This is turned on by default.
2001-03-02 20:54:03 +00:00
Poul-Henning Kamp
970680fad8 Fix jails. 2001-02-28 09:38:48 +00:00
Jonathan Lemon
7538a9a0f8 When iterating over our list of interface addresses in order to determine
if an arriving packet belongs to us, also check that the packet arrived
through the correct interface.  Skip this check if the packet was locally
generated.
2001-02-27 19:43:14 +00:00
Bill Fumerola
2a6cb8804e The TCP header-specific section suffered a little bit of bitrot recently:
When we recieve a fragmented TCP packet (other than the first) we can't
extract header information (we don't have state to reference). In a rather
unelegant fashion we just move on and assume a non-match.

Recent additions to the TCP header-specific section of the code neglected
to add the logic to the fragment code so in those cases the match was
assumed to be positive and those parts of the rule (which should have
resulted in a non-match/continue) were instead skipped (which means
the processing of the rule continued even though it had already not
matched).

Fault can be spread out over Rich Steenbergen (tcpoptions) and myself
(tcp{seq,ack,win}).

rwatson sent me a patch that got me thinking about this whole situation
(but what I'm committing / this description is mine so don't blame him).
2001-02-27 10:20:44 +00:00
Jonathan Lemon
7d42e30c2e Use more aggressive retransmit timeouts for the initial SYN packet.
As we currently drop the connection after 4 retransmits + 2 ICMP errors,
this allows initial connection attempts to be dropped much faster.
2001-02-26 21:33:55 +00:00
Jonathan Lemon
c693a045de Remove in_pcbnotify and use in_pcblookup_hash to find the cb directly.
For TCP, verify that the sequence number in the ICMP packet falls within
the tcp receive window before performing any actions indicated by the
icmp packet.

Clean up some layering violations (access to tcp internals from in_pcb)
2001-02-26 21:19:47 +00:00
Jeroen Ruigrok van der Werven
b9af273fe3 Remove struct full_tcpiphdr{}.
This piece of code has not been referenced since it was put there
in 1995.  Also done a codebased search on popular networking libraries
and third-party applications.  This is an orphan.

Reviewed by:	jesper
2001-02-26 20:10:16 +00:00
Jeroen Ruigrok van der Werven
05f15c3dc3 Remove conditionals for vax support.
People who care much about this are welcomed to try 2.11BSD. :)

Noticed by:	luigi
Reviewed by:	jesper
2001-02-26 20:05:32 +00:00
Jesper Skriver
694a9ff95b Remove tcp_drop_all_states, which is unneeded after jlemon removed it
from tcp_subr.c in rev 1.92
2001-02-25 17:20:19 +00:00
Jonathan Lemon
d8c85a260f Do not delay a new ack if there already is a delayed ack pending on the
connection, but send it immediately.  Prior to this change, it was possible
to delay a delayed-ack for multiple times, resulting in degraded TCP
behavior in certain corner cases.
2001-02-25 15:17:24 +00:00
Jonathan Lemon
c484d1a38c When converting soft error into a hard error, drop the connection. The
error will be passed up to the user, who will close the connection, so
it does not appear to make a sense to leave the connection open.

This also fixes a bug with kqueue, where the filter does not set EOF
on the connection, because the connection is still open.

Also remove calls to so{rw}wakeup, as we aren't doing anything with
them at the moment anyway.

Reviewed by: alfred, jesper
2001-02-23 21:07:06 +00:00
Jonathan Lemon
e4bb5b0572 Allow ICMP unreachables which map into PRC_UNREACH_ADMIN_PROHIB to
reset TCP connections which are in the SYN_SENT state, if the sequence
number in the echoed ICMP reply is correct.  This behavior can be
controlled by the sysctl net.inet.tcp.icmp_may_rst.

Currently, only subtypes 2,3,10,11,12 are treated as such
(port, protocol and administrative unreachables).

Assocaiate an error code with these resets which is reported to the
user application: ENETRESET.

Disallow resetting TCP sessions which are not in a SYN_SENT state.

Reviewed by: jesper, -net
2001-02-23 20:51:46 +00:00
Jesper Skriver
d1c54148b7 Redo the security update done in rev 1.54 of src/sys/netinet/tcp_subr.c
and 1.84 of src/sys/netinet/udp_usrreq.c

The changes broken down:

- remove 0 as a wildcard for addresses and port numbers in
  src/sys/netinet/in_pcb.c:in_pcbnotify()
- add src/sys/netinet/in_pcb.c:in_pcbnotifyall() used to notify
  all sessions with the specific remote address.
- change
  - src/sys/netinet/udp_usrreq.c:udp_ctlinput()
  - src/sys/netinet/tcp_subr.c:tcp_ctlinput()
  to use in_pcbnotifyall() to notify multiple sessions, instead of
  using in_pcbnotify() with 0 as src address and as port numbers.
- remove check for src port == 0 in
  - src/sys/netinet/tcp_subr.c:tcp_ctlinput()
  - src/sys/netinet/udp_usrreq.c:udp_ctlinput()
  as they are no longer needed.
- move handling of redirects and host dead from in_pcbnotify() to
  udp_ctlinput() and tcp_ctlinput(), so they will call
  in_pcbnotifyall() to notify all sessions with the specific
  remote address.

Approved by:	jlemon
Inspired by:    NetBSD
2001-02-22 21:23:45 +00:00
Jesper Skriver
43c77c8f5f Backout change in 1.153, as it violate rfc1122 section 3.2.1.3.
Requested by:	jlemon,ru
2001-02-21 16:59:47 +00:00
Robert Watson
91421ba234 o Move per-process jail pointer (p->pr_prison) to inside of the subject
credential structure, ucred (cr->cr_prison).
o Allow jail inheritence to be a function of credential inheritence.
o Abstract prison structure reference counting behind pr_hold() and
  pr_free(), invoked by the similarly named credential reference
  management functions, removing this code from per-ABI fork/exit code.
o Modify various jail() functions to use struct ucred arguments instead
  of struct proc arguments.
o Introduce jailed() function to determine if a credential is jailed,
  rather than directly checking pointers all over the place.
o Convert PRISON_CHECK() macro to prison_check() function.
o Move jail() function prototypes to jail.h.
o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the
  flag in the process flags field itself.
o Eliminate that "const" qualifier from suser/p_can/etc to reflect
  mutex use.

Notes:

o Some further cleanup of the linux/jail code is still required.
o It's now possible to consider resolving some of the process vs
  credential based permission checking confusion in the socket code.
o Mutex protection of struct prison is still not present, and is
  required to protect the reference count plus some fields in the
  structure.

Reviewed by:	freebsd-arch
Obtained from:	TrustedBSD Project
2001-02-21 06:39:57 +00:00
Jesper Skriver
58e9b41722 Only call in_pcbnotify if the src port number != 0, as we
treat 0 as a wildcard in src/sys/in_pbc.c:in_pcbnotify()

It's sufficient to check for src|local port, as we'll have no
sessions with src|local port == 0

Without this a attacker sending ICMP messages, where the attached
IP header (+ 8 bytes) has the address and port numbers == 0, would
have the ICMP message applied to all sessions.

PR:		kern/25195
Submitted by:	originally by jesper, reimplimented by jlemon's advice
Reviewed by:	jlemon
Approved by:	jlemon
2001-02-20 23:25:04 +00:00
Jesper Skriver
2b18d82220 Send a ICMP unreachable instead of dropping the packet silent, if we
receive a packet not for us, and forwarding disabled.

PR:		kern/24512
Reviewed by:	jlemon
Approved by:	jlemon
2001-02-20 21:31:47 +00:00
Jesper Skriver
c2221099a9 Remove unneeded loop increment in src/sys/netinet/in_pcb.c:in_pcbnotify
Forgotten by phk, when committing fix in kern/23986

PR:		kern/23986
Reviewed by:	phk
Approved by:	phk
2001-02-20 21:11:29 +00:00
Brian Feldman
c0511d3b58 Switch to using a struct xucred instead of a struct xucred when not
actually in the kernel.  This structure is a different size than
what is currently in -CURRENT, but should hopefully be the last time
any application breakage is caused there.  As soon as any major
inconveniences are removed, the definition of the in-kernel struct
ucred should be conditionalized upon defined(_KERNEL).

This also changes struct export_args to remove dependency on the
constantly-changing struct ucred, as well as limiting the bounds
of the size fields to the correct size.  This means: a) mountd and
friends won't break all the time, b) mountd and friends won't crash
the kernel all the time if they don't know what they're doing wrt
actual struct export_args layout.

Reviewed by:	bde
2001-02-18 13:30:20 +00:00
Poul-Henning Kamp
90fcbbd635 Remove unneeded loop increment in src/sys/netinet/in_pcb.c:in_pcbnotify
Add new PRC_UNREACH_ADMIN_PROHIB in sys/sys/protosw.h

Remove condition on TCP in src/sys/netinet/ip_icmp.c:icmp_input

In src/sys/netinet/ip_icmp.c:icmp_input set code = PRC_UNREACH_ADMIN_PROHIB
or PRC_UNREACH_HOST for all unreachables except ICMP_UNREACH_NEEDFRAG

Rename sysctl icmp_admin_prohib_like_rst to icmp_unreach_like_rst
to reflect the fact that we also react on ICMP unreachables that
are not administrative prohibited.  Also update the comments to
reflect this.

In sys/netinet/tcp_subr.c:tcp_ctlinput add code to treat
PRC_UNREACH_ADMIN_PROHIB and PRC_UNREACH_HOST different.

PR:		23986
Submitted by:	Jesper Skriver <jesper@skriver.dk>
2001-02-18 09:34:55 +00:00
Luigi Rizzo
c1b843c774 remove unused data structure definition, and corresponding macro into*() 2001-02-18 07:10:03 +00:00
Jonathan Lemon
7c45cb9bca Clean up warning. 2001-02-15 22:32:06 +00:00
Jeroen Ruigrok van der Werven
e61c4bedda Add definitions for IPPROTO numbers 55-57. 2001-02-14 13:51:20 +00:00
Poul-Henning Kamp
bb07ec8c84 Introduce a new feature in IPFW: Check of the source or destination
address is configured on a interface.  This is useful for routers with
dynamic interfaces.  It is now possible to say:

        0100 allow       tcp from any to any established
        0200 skipto 1000 tcp from any to any
        0300 allow       ip from any to any
        1000 allow       tcp from 1.2.3.4 to me 22
        1010 deny        tcp from any to me 22
        1020 allow       tcp from any to any

and not have to worry about the behaviour if dynamic interfaces configure
new IP numbers later on.

The check is semi expensive (traverses the interface address list)
so it should be protected as in the above example if high performance
is a requirement.
2001-02-13 14:12:37 +00:00
Bosko Milekic
a57815efd2 Clean up RST ratelimiting. Previously, ratelimiting occured before tests
were performed to determine if the received packet should be reset. This
created erroneous ratelimiting and false alarms in some cases. The code
has now been reorganized so that the checks for validity come before
the call to badport_bandlim. Additionally, a few changes in the symbolic
names of the bandlim types have been made, as well as a clarification of
exactly which type each RST case falls under.

Submitted by: Mike Silbersack <silby@silby.com>
2001-02-11 07:39:51 +00:00
Luigi Rizzo
7e1cd0d23d Sync with the bridge/dummynet/ipfw code already tested in stable.
In ip_fw.[ch] change a couple of variable and field names to
avoid having types, variables and fields with the same name.
2001-02-10 00:10:18 +00:00
Jeroen Ruigrok van der Werven
1a6e52d0e9 Fix typo: seperate -> separate.
Seperate does not exist in the english language.
2001-02-06 11:21:58 +00:00
Poul-Henning Kamp
6817526d14 Convert if_multiaddrs from LIST to TAILQ so that it can be traversed
backwards in the three drivers which want to do that.

Reviewed by:    mikeh
2001-02-06 10:12:15 +00:00
Julian Elischer
41d2ba5e27 Fix bad patch from a few days ago. It broke some bridging. 2001-02-05 21:25:27 +00:00
Poul-Henning Kamp
37d4006626 Another round of the <sys/queue.h> FOREACH transmogriffer.
Created with:   sed(1)
Reviewed by:    md5(1)
2001-02-04 16:08:18 +00:00
Darren Reed
185b71c73e fix duplicate rcsid 2001-02-04 15:25:15 +00:00
Darren Reed
f590526d0a fix conflicts 2001-02-04 14:26:56 +00:00
Darren Reed
d42c04169e Update IP Filter kernel source 2001-02-04 14:15:48 +00:00
Poul-Henning Kamp
fc2ffbe604 Mechanical change to use <sys/queue.h> macro API instead of
fondling implementation details.

Created with: sed(1)
Reviewed by: md5(1)
2001-02-04 13:13:25 +00:00
Poul-Henning Kamp
ef9e85abba Use <sys/queue.h> macro API. 2001-02-04 12:37:48 +00:00
Julian Elischer
c8f8e9c110 Make the code act the same in the case of BRIDGE being defined, but not
turned on, and the case of it not being defined at all.
i.e. Disabling bridging re-enables some of the checks it disables.

Submitted by: "Rogier R. Mulhuijzen" <drwilco@drwilco.net>
2001-02-03 17:25:21 +00:00
Jonathan Lemon
007581c0d8 When turning off TCP_NOPUSH, call tcp_output to immediately flush
out any data pending in the buffer.

Submitted by: Tony Finch <dot@dotat.at>
2001-02-02 18:48:25 +00:00
Luigi Rizzo
507b4b5432 MFS: bridge/ipfw/dummynet fixes (bridge.c will be committed separately) 2001-02-02 00:18:00 +00:00
Brian Somers
435ff15c3b Add a few ``const''s to silence some -Wwrite-strings warnings 2001-01-29 11:44:13 +00:00
Brian Somers
4834b77d04 Ignore leading witespace in the string given to PacketAliasProxyRule(). 2001-01-29 00:30:01 +00:00
Luigi Rizzo
f8acf87bb5 Make sure we do not follow an invalid pointer in ipfw_report
when we get an incomplete packet or m_pullup fails.
2001-01-27 02:31:08 +00:00
Luigi Rizzo
26fb17bdd0 Minor cleanups after yesterday's patch.
The code (bridging and dummynet) actually worked fine!
2001-01-26 19:43:54 +00:00
Luigi Rizzo
6258acf88f Bring dummynet in line with the code that now works in -STABLE.
It compiles, but I cannot test functionality yet.
2001-01-26 06:49:34 +00:00
Luigi Rizzo
7a726a2dd1 Pass up errors returned by dummynet. The same should be done with
divert.
2001-01-25 02:06:38 +00:00
Garrett Wollman
a589a70ee1 Correct a comment. 2001-01-24 16:25:36 +00:00
Wes Peters
550b151850 When attempting to bind to an ephemeral port, if no such port is
available, the error return should be EADDRNOTAVAIL rather than
EAGAIN.

PR:		14181
Submitted by:	Dima Dorfman <dima@unixfreak.org>
Reviewed by:	Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
2001-01-23 07:27:56 +00:00
Luigi Rizzo
8b2cd62d7d Change critical section protection for dummynet from splnet() to
splimp() -- we need it because dummynet can be invoked by the
bridging code at splimp().

This should cure the pipe "stalls" that several people have been
reporting on -stable while using bridging+dummynet (the problem
would not affect routers using dummynet).
2001-01-22 23:04:13 +00:00
Dag-Erling Smørgrav
a3ea6d41b9 First step towards an MP-safe zone allocator:
- have zalloc() and zfree() always lock the vm_zone.
 - remove zalloci() and zfreei(), which are now redundant.

Reviewed by:	bmilekic, jasone
2001-01-21 22:23:11 +00:00
Luigi Rizzo
ec97c79e30 Document data structures and operation on dummynet so next time
I or someone else browse through this code I do not have a hard
time understanding what is going on.
2001-01-17 01:09:40 +00:00
Luigi Rizzo
5da48f88bd Some dummynet patches that I forgot to commit last summer.
One of them fixes a potential panic when bridging is used and
you run out of mbufs (though i have no idea if the bug has
ever hit anyone).
2001-01-16 23:49:49 +00:00
Bosko Milekic
987efc765e Prototype inet_ntoa_r and thereby silence a warning from GCC. The function
is prototyped immediately under inet_ntoa, which is also from libkern.
2001-01-12 07:47:53 +00:00
Robert Watson
46a27060af o Minor style(9)ism to make consistent with -STABLE 2001-01-09 18:26:17 +00:00
Robert Watson
65450f2f77 o IPFW incorrectly handled filtering in the presence of previously
reserved and now allocated TCP flags in incoming packets.  This patch
  stops overloading those bits in the IP firewall rules, and moves
  colliding flags to a seperate field, ipflg.  The IPFW userland
  management tool, ipfw(8), is updated to reflect this change.  New TCP
  flags related to ECN are now included in tcp.h for reference, although
  we don't currently implement TCP+ECN.

o To use this fix without completely rebuilding, it is sufficient to copy
  ip_fw.h and tcp.h into your appropriate include directory, then rebuild
  the ipfw kernel module, and ipfw tool, and install both.  Note that a
  mismatch between module and userland tool will result in incorrect
  installation of firewall rules that may have unexpected effects.  This
  is an MFC candidate, following shakedown.  This bug does not appear
  to affect ipfilter.

Reviewed by:	security-officer, billf
Reported by:	Aragon Gouveia <aragon@phat.za.net>
2001-01-09 03:10:30 +00:00
Alfred Perlstein
3269187d41 provide a sysctl 'net.link.ether.inet.log_arp_wrong_iface' to allow one
to supress logging when ARP replies arrive on the wrong interface:
 "/kernel: arp: 1.2.3.4 is on dc0 but got reply from 00:00:c5:79:d0:0c on dc1"

the default is to log just to give notice about possibly incorrectly
configured networks.
2001-01-06 00:45:08 +00:00
Alfred Perlstein
da289f07ee Fix incorrect logic wouldn't disconnect incomming connections that had been
disconnected because they were not full.

Submitted by: David Filo
2001-01-03 19:50:23 +00:00
Assar Westerlund
598ce68dbd include tcp header files to get the prototype for tcp_seq_vs_sess 2000-12-27 03:02:29 +00:00
Poul-Henning Kamp
442fad6798 Update the "icmp_admin_prohib_like_rst" code to check the tcp-window and
to be configurable with respect to acting only in SYN or in all TCP states.

PR:		23665
Submitted by:	Jesper Skriver <jesper@skriver.dk>
2000-12-24 10:57:21 +00:00
Bosko Milekic
2a0c503e7a * Rename M_WAIT mbuf subsystem flag to M_TRYWAIT.
This is because calls with M_WAIT (now M_TRYWAIT) may not wait
  forever when nothing is available for allocation, and may end up
  returning NULL. Hopefully we now communicate more of the right thing
  to developers and make it very clear that it's necessary to check whether
  calls with M_(TRY)WAIT also resulted in a failed allocation.
  M_TRYWAIT basically means "try harder, block if necessary, but don't
  necessarily wait forever." The time spent blocking is tunable with
  the kern.ipc.mbuf_wait sysctl.
  M_WAIT is now deprecated but still defined for the next little while.

* Fix a typo in a comment in mbuf.h

* Fix some code that was actually passing the mbuf subsystem's M_WAIT to
  malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the
  value of the M_WAIT flag, this could have became a big problem.
2000-12-21 21:44:31 +00:00
Bill Fumerola
16cd6db04f Use getmicrotime() instead of microtime() when timestamping ICMP packets,
the former is quicker and accurate enough for use here.

Submitted by:	Jason Slagle <raistlin@toledolink.com> (on IRC)
Reviewed by:	phk
2000-12-16 21:39:48 +00:00
Poul-Henning Kamp
b11d7a4a2f We currently does not react to ICMP administratively prohibited
messages send by routers when they deny our traffic, this causes
a timeout when trying to connect to TCP ports/services on a remote
host, which is blocked by routers or firewalls.

rfc1122 (Requirements for Internet Hosts) section 3.2.2.1 actually
requi re that we treat such a message for a TCP session, that we
treat it like if we had recieved a RST.

quote begin.

            A Destination Unreachable message that is received MUST be
            reported to the transport layer.  The transport layer SHOULD
            use the information appropriately; for example, see Sections
            4.1.3.3, 4.2.3.9, and 4.2.4 below.  A transport protocol
            that has its own mechanism for notifying the sender that a
            port is unreachable (e.g., TCP, which sends RST segments)
            MUST nevertheless accept an ICMP Port Unreachable for the
            same purpose.

quote end.

I've written a small extension that implement this, it also create
a sysctl "net.inet.tcp.icmp_admin_prohib_like_rst" to control if
this new behaviour is activated.

When it's activated (set to 1) we'll treat a ICMP administratively
prohibited message (icmp type 3 code 9, 10 and 13) for a TCP
sessions, as if we recived a TCP RST, but only if the TCP session
is in SYN_SENT state.

The reason for only reacting when in SYN_SENT state, is that this
will solve the problem, and at the same time minimize the risk of
this being abused.

I suggest that we enable this new behaviour by default, but it
would be a change of current behaviour, so if people prefer to
leave it disabled by default, at least for now, this would be ok
for me, the attached diff actually have the sysctl set to 0 by
default.

PR:		23086
Submitted by:	Jesper Skriver <jesper@skriver.dk>
2000-12-16 19:42:06 +00:00
Bosko Milekic
09f81a46a5 Change the following:
1.  ICMP ECHO and TSTAMP replies are now rate limited.
  2.  RSTs generated due to packets sent to open and unopen ports
      are now limited by seperate counters.
  3.  Each rate limiting queue now has its own description, as
      follows:

      Limiting icmp unreach response from 439 to 200 packets per second
      Limiting closed port RST response from 283 to 200 packets per second
      Limiting open port RST response from 18724 to 200 packets per second
      Limiting icmp ping response from 211 to 200 packets per second
      Limiting icmp tstamp response from 394 to 200 packets per second

Submitted by: Mike Silbersack <silby@silby.com>
2000-12-15 21:45:49 +00:00
David Malone
7cc0979fd6 Convert more malloc+bzero to malloc+M_ZERO.
Submitted by:	josh@zipperup.org
Submitted by:	Robert Drehmel <robd@gmx.net>
2000-12-08 21:51:06 +00:00
Poul-Henning Kamp
959b7375ed Staticize some malloc M_ instances. 2000-12-08 20:09:00 +00:00
Jonathan Lemon
df5e198723 Lock down the network interface queues. The queue mutex must be obtained
before adding/removing packets from the queue.  Also, the if_obytes and
if_omcasts fields should only be manipulated under protection of the mutex.

IF_ENQUEUE, IF_PREPEND, and IF_DEQUEUE perform all necessary locking on
the queue.  An IF_LOCK macro is provided, as well as the old (mutex-less)
versions of the macros in the form _IF_ENQUEUE, _IF_QFULL, for code which
needs them, but their use is discouraged.

Two new macros are introduced: IF_DRAIN() to drain a queue, and IF_HANDOFF,
which takes care of locking/enqueue, and also statistics updating/start
if necessary.
2000-11-25 07:35:38 +00:00
Jonathan Lemon
e82ac18e52 Revert the last commit to the callout interface, and add a flag to
callout_init() indicating whether the callout is safe or not.  Update
the callers of callout_init() to reflect the new interface.

Okayed by: Jake
2000-11-25 06:22:16 +00:00
Bosko Milekic
a352dd9a71 Fixup (hopefully) bridging + ipfw + dummynet together...
* Some dummynet code incorrectly handled a malloc()-allocated pseudo-mbuf
  header structure, called "pkt," and could consequently pollute the mbuf
  free list if it was ever passed to m_freem(). The fix involved passing not
  pkt, but essentially pkt->m_next (which is a real mbuf) to the mbuf
  utility routines.

* Also, for dummynet, in bdg_forward(), made the code copy the ethernet header
  back into the mbuf (prepended) because the dummynet code that follows expects
  it to be there but it is, unfortunately for dummynet, passed to bdg_forward
  as a seperate argument.

PRs: kern/19551 ; misc/21534 ; kern/23010
Submitted by: Thomas Moestl <tmoestl@gmx.net>
Reviewed by: bmilekic
Approved by: luigi
2000-11-23 22:25:03 +00:00
Ruslan Ermilov
1b7b85c4d6 mdoc(7) police: use the new feature of the An macro. 2000-11-22 08:47:35 +00:00
Bosko Milekic
0a1df235ba While I'm here, get rid of (now useless) MCLISREFERENCED and use MEXT_IS_REF
instead.
Also, fix a small set of "avail." If we're setting `avail,' we shouldn't
be re-checking whether m_flags is M_EXT, because we know that it is, as if
it wasn't, we would have already returned several lines above.

Reviewed by: jlemon
2000-11-11 23:05:59 +00:00
Ruslan Ermilov
203de3b494 Fixed the security breach I introduced in rev 1.145.
Disallow getsockopt(IP_FW_ADD) if securelevel >= 3.

PR:		22600
2000-11-07 09:20:32 +00:00
Jonathan Lemon
8735719e43 tp->snd_recover is part of the New Reno recovery algorithm, and should
only be checked if the system is currently performing New Reno style
fast recovery.  However, this value was being checked regardless of the
NR state, with the end result being that the congestion window was never
opened.

Change the logic to check t_dupack instead; the only code path that
allows it to be nonzero at this point is NewReno, so if it is nonzero,
we are in fast recovery mode and should not touch the congestion window.

Tested by:	phk
2000-11-04 15:59:39 +00:00
Ruslan Ermilov
1d02752206 Fixed the bug I have introduced in icmp_error() in revision 1.44.
The amount of data we copy from the original IP datagram into the
ICMP message was computed incorrectly for IP packets with payload
less than 8 bytes.
2000-11-02 09:46:23 +00:00
Ruslan Ermilov
506f494939 Wrong checksum may have been computed for certain UDP packets.
Reviewed by:	jlemon
2000-11-01 16:56:33 +00:00
Ruslan Ermilov
60123168be Wrong checksum used for certain reassembled IP packets before diverting. 2000-11-01 11:21:45 +00:00
Josef Karthauser
ffa37b3f9b It's no longer true that "nobody uses ia beyond here"; it's now
used to keep address based if_data statistics in.

Submitted by:	ru
2000-11-01 01:59:28 +00:00
Ruslan Ermilov
48cb400fb1 Do not waste a time saving a copy of IP header if we are certainly
not going to send an ICMP error message (net.inet.udp.blackhole=1).
2000-10-31 09:13:02 +00:00
Ruslan Ermilov
642cd09fb3 Added boolean argument to link searching functions, indicating
whether they should create a link if lookup has failed or not.
2000-10-30 17:24:12 +00:00
Ruslan Ermilov
03453c5e87 A significant rewrite of PPTP aliasing code.
PPTP links are no longer dropped by simple (and inappropriate in this
case) "inactivity timeout" procedure, only when requested through the
control connection.

It is now possible to have multiple PPTP servers running behind NAT.
Just redirect the incoming TCP traffic to port 1723, everything else
is done transparently.

Problems were reported and the fix was tested by:
		Michael Adler <Michael.Adler@compaq.com>,
		David Andersen <dga@lcs.mit.edu>
2000-10-30 12:39:41 +00:00
Poul-Henning Kamp
cf9fa8e725 Move suser() and suser_xxx() prototypes and a related #define from
<sys/proc.h> to <sys/systm.h>.

Correctly document the #includes needed in the manpage.

Add one now needed #include of <sys/systm.h>.
Remove the consequent 48 unused #includes of <sys/proc.h>.
2000-10-29 16:06:56 +00:00
Poul-Henning Kamp
53ce36d17a Remove unneeded #include <sys/proc.h> lines. 2000-10-29 13:57:19 +00:00
Darren Reed
0c72e2855d Fix conflicts creted by import. 2000-10-29 07:53:05 +00:00
Darren Reed
a49e8152f5 Import IP filter 3.4.13 2000-10-29 07:50:11 +00:00
Josef Karthauser
fe93767490 Count per-address statistics for IP fragments.
Requested by:	ru
Obtained from:	BSD/OS
2000-10-29 01:05:09 +00:00
David E. O'Brien
d0eaa94443 Include sys/param.h for `__FreeBSD_version' rather than the non-existent
osreldate.h.

Submitted by:	dougb
2000-10-27 12:53:31 +00:00
Poul-Henning Kamp
46aa3347cb Convert all users of fldoff() to offsetof(). fldoff() is bad
because it only takes a struct tag which makes it impossible to
use unions, typedefs etc.

Define __offsetof() in <machine/ansi.h>

Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h>

Remove myriad of local offsetof() definitions.

Remove includes of <stddef.h> in kernel code.

NB: Kernelcode should *never* include from /usr/include !

Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API.

Deprecate <struct.h> with a warning.  The warning turns into an error on
01-12-2000 and the file gets removed entirely on 01-01-2001.

Paritials reviews by:   various.
Significant brucifications by:  bde
2000-10-27 11:45:49 +00:00
Ruslan Ermilov
3cebc3e4de Fetch the protocol header (TCP, UDP, ICMP) only from the first fragment
of IP datagram.  This fixes the problem when firewall denied fragmented
packets whose last fragment was less than minimum protocol header size.

Found by:	Harti Brandt <brandt@fokus.gmd.de>
PR:		kern/22309
2000-10-27 07:19:17 +00:00
Ruslan Ermilov
b6ea1aa58d RFC 791 says that IP_RF bit should always be zero, but nothing
in the code enforces this.  So, do not check for and attempt a
false reassembly if only IP_RF is set.

Also, removed the dead code, since we no longer use dtom() on
return from ip_reass().
2000-10-26 13:14:48 +00:00
Darren Reed
60b88d9681 fix conflicts from rcsids 2000-10-26 12:33:42 +00:00
Darren Reed
b13a6dbb28 Import IP Filter 3.4.12 into kernel source tree 2000-10-26 12:28:47 +00:00
Ruslan Ermilov
7e2df4520d Wrong header length used for certain reassembled IP packets.
This was first fixed in rev 1.82 but then broken in rev 1.125.

PR:		6177
2000-10-26 12:18:13 +00:00
Luigi Rizzo
1f8ed85239 Close PR22152 and PR19511 -- correct the naming of a variable 2000-10-26 00:16:12 +00:00
Ruslan Ermilov
8829f4ee0b We now keep the ip_id field in network byte order all the
time, so there is no need to make the distinction between
ip_output() and ip_input() cases.

Reviewed by:	silence on freebsd-net
2000-10-25 10:56:41 +00:00
Jun-ichiro itojun Hagino
d31944e6ec be careful on mbuf overrun on ctlinput.
short icmp6 packet may be able to panic the kernel.
sync with kame.
2000-10-23 07:11:01 +00:00
Ruslan Ermilov
cc22c7a746 Save a few CPU cycles in IP fragmentation code. 2000-10-20 14:10:37 +00:00
Josef Karthauser
5da9f8fa97 Augment the 'ifaddr' structure with a 'struct if_data' to keep
statistics on a per network address basis.

Teach the IPv4 and IPv6 input/output routines to log packets/bytes
against the network address connected to the flow.

Teach netstat to display the per-address stats for IP protocols
when 'netstat -i' is evoked, instead of displaying the per-interface
stats.
2000-10-19 23:15:54 +00:00
Ruslan Ermilov
f136389613 A failure to allocate memory for auxiliary TCP data is now fatal.
This fixes a null pointer dereference problem that is unlikely to
happen in normal circumstances.
2000-10-19 10:44:44 +00:00
Ruslan Ermilov
0531ca1fd8 If we do not byte-swap the ip_id in the first place, don't do it in
the second.  NetBSD (from where I've taken this originally) needs
to fix this too.
2000-10-18 11:36:09 +00:00
Ruslan Ermilov
487bdb3855 Backout my wrong attempt to fix the compilation warning in ip_input.c
and instead reapply the revision 1.49 of mbuf.h, i.e.

Fixed regression of the type of the `header' member of struct pkthdr from
`void *' to caddr_t in rev.1.51.  This mainly caused an annoying warning
for compiling ip_input.c.

Requested by:	bde
2000-10-12 16:33:41 +00:00
Ruslan Ermilov
e6c89c1bd2 Fix the compilation warning. 2000-10-12 10:42:32 +00:00
Ruslan Ermilov
bc95ac80b2 Allow for IP_FW_ADD to be used in getsockopt(2) incarnation as
well, in which case return the rule number back into userland.

PR:		bin/18351
Reviewed by:	archie, luigi
2000-10-12 07:59:14 +00:00
Alfred Perlstein
abbfaeb87b Remove headers not needed.
Pointed out by: phk
2000-10-07 23:15:17 +00:00
Ruslan Ermilov
c0752e1657 As we now may check the TCP header window field, make sure we pullup
enough into the mbuf data area.  Solve this problem once and for all
by pulling up the entire (standard) header for TCP and UDP, and four
bytes of header for ICMP (enough for type, code and cksum fields).
2000-10-06 12:12:09 +00:00
Ruslan Ermilov
60f9125458 Added the missing ntohs() conversion when matching IP packet with
the IP_FW_IF_IPID rule.  (We have recently decided to keep the
ip_id field in network byte order inside the kernel, see revision
1.140 of src/sys/netinet/ip_input.c).

I did not like to have the conversion happen in userland, and I
think that the similar conversions for fw_tcp(seq|ack|win) should
be moved out of userland (src/sbin/ipfw/ipfw.c) into the kernel.
2000-10-03 12:18:11 +00:00
Jonathan Lemon
d17e895b5f If TCPDEBUG is defined, we could dereference a tp which was freed. 2000-10-02 15:00:13 +00:00
Ruslan Ermilov
c7f95f5372 A bit of indentation reformatting. 2000-10-02 13:13:24 +00:00
Bill Fumerola
9ad30943aa Add new fields for more granularity:
IP: version, tos, ttl, len, id
	TCP: seq#, ack#, window size

Reviewed by:  silence on freebsd-{net,ipfw}
2000-10-02 03:33:31 +00:00
Bill Fumerola
98b829924f Add new fields for more granularity:
IP: version, tos, ttl, len, id
	TCP: seq#, ack#, window size

Reviewed by:	silence on freebsd-{net,ipfw}
2000-10-02 03:03:31 +00:00
Ruslan Ermilov
3ea420e391 Document that net.inet.ip.fw.one_pass only affects dummynet(4).
Noticed by:	Peter Jeremy<peter.jeremy@alcatel.com.au>
2000-09-29 08:39:06 +00:00
Kris Kennaway
be515d91ad Use stronger random number generation for TCP_ISSINCR and tcp_iss.
Reviewed by:	peter, jlemon
2000-09-29 01:37:19 +00:00
Bosko Milekic
9d8c8a672c Finally make do_tcpdrain sysctl live under correct parent, _net_inet_tcp,
as opposed to _debug. Like before, default value remains 1.
2000-09-25 23:40:22 +00:00
Ruslan Ermilov
0122d6f195 Fixed the calculations with UDP header length field.
The field is in network byte order and contains the
size of the header.

Reviewed by:	brian
2000-09-21 06:52:59 +00:00
Kenjiro Cho
e645a1ca27 change the evaluation order of the rsvp socket in rsvp_input()
in favor of the new-style per-vif socket.

this does not affect the behavior of the ISI rsvpd but allows
another rsvp implementation (e.g., KOM rsvp) to take advantage
of the new style for particular sockets while using the old style
for others.

in the future, rsvp supporn should be replaced by more generic
router-alert support.

PR:		kern/20984
Submitted by:	Martin Karsten <Martin.Karsten@KOM.tu-darmstadt.de>
Reviewed by:	kjc
2000-09-17 13:50:12 +00:00
Poul-Henning Kamp
e4bdf25dc8 Properly jail UDP sockets. This is quite a bit more tricky than TCP.
This fixes a !root userland panic, and some cases where the wrong
interface was chosen for a jailed UDP socket.

PR:		20167, 19839, 20946
2000-09-17 13:35:42 +00:00
Poul-Henning Kamp
24b261c720 Reverse last commit, a better fix has been found. 2000-09-17 13:34:18 +00:00
Poul-Henning Kamp
e2cabba9d7 Make sure UDP sockets are explicitly bind(2)'ed [sic] before we connect(2)
them.

PR:     20946
Isolated by:    Aaron Gifford <agifford@infowest.com>
2000-09-17 11:34:33 +00:00
Jonathan Lemon
af1270f87f It is possible for a TCP callout to be removed from the timing wheel,
but have a network interrupt arrive and deactivate the timeout before
the callout routine runs.  Check for this case in the callout routine;
it should only run if the callout is active and not on the wheel.
2000-09-16 00:53:53 +00:00
Ruslan Ermilov
4996f02545 Add -Wmissing-prototypes. 2000-09-15 15:37:16 +00:00
Jonathan Lemon
a8db1d93f1 m_cat() can free its second argument, so collect the checksum information
from the fragment before calling m_cat().
2000-09-14 21:06:48 +00:00
Ruslan Ermilov
e30177e024 Follow BSD/OS and NetBSD, keep the ip_id field in network order all the time.
Requested by:	wollman
2000-09-14 14:42:04 +00:00
Bill Fumerola
95d0db2b40 Fix screwup in previous commit. 2000-09-12 02:38:05 +00:00
Archie Cobbs
6612c70eb1 Don't do snd_nxt rollback optimization (rev. 1.46) for SYN packets.
It causes a panic when/if snd_una is incremented elsewhere (this
is a conservative change, because originally no rollback occurred
for any packets at all).

Submitted by:	Vivek Sadananda Pai <vivek@imimic.com>
2000-09-11 19:11:33 +00:00
Alfred Perlstein
b47ce7f5cb Forget to include sysctl.h
Submitted by: des
2000-09-09 18:47:46 +00:00
Alfred Perlstein
34b94e8b82 Accept filter maintainance
Update copyrights.

Introduce a new sysctl node:
  net.inet.accf

Although acceptfilters need refcounting to be properly (safely) unloaded
as a temporary hack allow them to be unloaded if the sysctl
net.inet.accf.unloadable is set, this is really for developers who want
to work on thier own filters.

A near complete re-write of the accf_http filter:
  1) Parse check if the request is HTTP/1.0 or HTTP/1.1 if not dump
     to the application.
     Because of the performance implications of this there is a sysctl
     'net.inet.accf.http.parsehttpversion' that when set to non-zero
     parses the HTTP version.
     The default is to parse the version.
  2) Check if a socket has filled and dump to the listener
  3) optimize the way that mbuf boundries are handled using some voodoo
  4) even though you'd expect accept filters to only be used on TCP
     connections that don't use m_nextpkt I've fixed the accept filter
     for socket connections that use this.

This rewrite of accf_http should allow someone to use them and maintain
full HTTP compliance as long as net.inet.accf.http.parsehttpversion is
set.
2000-09-06 18:49:13 +00:00
Bill Fumerola
4897e8320e 1. IP_FW_F_{UID,GID} are _not_ commands, they are extras. The sanity checking
for them does not belong in the IP_FW_F_COMMAND switch, that mask doesn't even
apply to them(!).

2. You cannot add a uid/gid rule to something that isn't TCP, UDP, or IP.

XXX - this should be handled in ipfw(8) as well (for more diagnostic output),
but this at least protects bogus rules from being added.

Pointy hat:	green
2000-09-06 03:10:42 +00:00
Ruslan Ermilov
76e6ebd64e Match IPPROTO_ICMP with IP protocol field of the original IP
datagram embedded into ICMP error message, not with protocol
field of ICMP message itself (which is always IPPROTO_ICMP).

Pointed by:	Erik Salander <erik@whistle.com>
2000-09-01 16:38:53 +00:00
Ruslan Ermilov
04287599db Fixed broken ICMP error generation, unified conversion of IP header
fields between host and network byte order.  The details:

o icmp_error() now does not add IP header length.  This fixes the problem
  when icmp_error() is called from ip_forward().  In this case the ip_len
  of the original IP datagram returned with ICMP error was wrong.

o icmp_error() expects all three fields, ip_len, ip_id and ip_off in host
  byte order, so DTRT and convert these fields back to network byte order
  before sending a message.  This fixes the problem described in PR 16240
  and PR 20877 (ip_id field was returned in host byte order).

o ip_ttl decrement operation in ip_forward() was moved down to make sure
  that it does not corrupt the copy of original IP datagram passed later
  to icmp_error().

o A copy of original IP datagram in ip_forward() was made a read-write,
  independent copy.  This fixes the problem I first reported to Garrett
  Wollman and Bill Fenner and later put in audit trail of PR 16240:
  ip_output() (not always) converts fields of original datagram to network
  byte order, but because copy (mcopy) and its original (m) most likely
  share the same mbuf cluster, ip_output()'s manipulations on original
  also corrupted the copy.

o ip_output() now expects all three fields, ip_len, ip_off and (what is
  significant) ip_id in host byte order.  It was a headache for years that
  ip_id was handled differently.  The only compatibility issue here is the
  raw IP socket interface with IP_HDRINCL socket option set and a non-zero
  ip_id field, but ip.4 manual page was unclear on whether in this case
  ip_id field should be in host or network byte order.
2000-09-01 12:33:03 +00:00
Ruslan Ermilov
816fa7febc Changed the way we handle outgoing ICMP error messages -- do
not alias `ip_src' unless it comes from the host an original
datagram that triggered this error message was destined for.

PR:		20712
Reviewed by:	brian, Charles Mott <cmott@scientech.com>
2000-09-01 09:32:44 +00:00
Ruslan Ermilov
0ac308534e Grab ADJUST_CHECKSUM() macro from alias_local.h. 2000-08-31 12:54:55 +00:00
Ruslan Ermilov
305d10699e Create aliasing links for incoming ICMP echo/timestamp requests.
This makes outgoing ICMP echo/timestamp replies to be de-aliased
with the right source IP, not exactly the primary aliasing IP.
2000-08-31 12:47:57 +00:00
Ruslan Ermilov
3e065e76ac Fixed the bug that div_bind() always returned zero
even if there was an error (broken in rev 1.9).
2000-08-30 14:43:02 +00:00
Ruslan Ermilov
2160daba07 Backout the hack in rev 1.71, I am working on a better patch
that should cover almost all inconsistencies in ICMP error
generation.
2000-08-30 08:28:06 +00:00
Andrey A. Chernov
d9e630b592 strtok -> strsep (no strtok allowed in libraries)
add unsigned char cast to ctype macro
2000-08-29 21:34:55 +00:00
Darren Reed
473998719e Apply appropriate patch.
PR:		20877
Submitted by:	Frank Volf (volf@oasis.IAEhv.nl)
2000-08-29 10:41:55 +00:00
Archie Cobbs
11840b0692 Remove obsolete comment. 2000-08-22 00:32:52 +00:00
Bruce Evans
4153a3a323 Fixed a missing splx() in if_addmulti(). Was broken in rev.1.28. 2000-08-19 22:10:10 +00:00
Jun-ichiro itojun Hagino
d1d1144bd7 repair endianness issue in IN_MULTICAST().
again, *BSD difference...

From: Nick Sayer <nsayer@quack.kfu.com>
2000-08-15 07:34:08 +00:00
Ruslan Ermilov
a05f06d79f Fixed PunchFW code segmentation violation bug.
Reported by:	Christian Schade <chris@cube.sax.de>
2000-08-14 15:24:47 +00:00
Ruslan Ermilov
b834663f47 Use queue(3) LIST_* macros for doubly-linked lists. 2000-08-14 14:18:16 +00:00
Darren Reed
5e90b39cba resolve conflicts 2000-08-13 04:31:06 +00:00
Darren Reed
6adaca6e12 Import IP Filter 3.4.9 bits into the kernel 2000-08-13 04:28:25 +00:00
Ruslan Ermilov
0eb10a0963 - Do not modify Peer's Call ID in outgoing Incoming-Call-Connected
PPTP control messages.

- Cosmetics: replace `GRE link' with `PPTP link'.

Reviewed by:	Erik Salander <erik@whistle.com>
2000-08-09 11:25:44 +00:00
Ruslan Ermilov
934a4fb381 Adjust TCP checksum rather than compute it afresh.
Submitted by:	Erik Salander <erik@whistle.com>
2000-08-07 09:51:04 +00:00
Archie Cobbs
7734ea0612 Improve performance in the case where ip_output() returns an error.
When this happens, we know for sure that the packet data was not
received by the peer. Therefore, back out any advancing of the
transmit sequence number so that we send the same data the next
time we transmit a packet, avoiding a guaranteed missed packet and
its resulting TCP transmit slowdown.

In most systems ip_output() probably never returns an error, and
so this problem is never seen. However, it is more likely to occur
with device drivers having short output queues (causing ENOBUFS to
be returned when they are full), not to mention low memory situations.

Moreover, because of this problem writers of slow devices were
required to make an unfortunate choice between (a) having a relatively
short output queue (with low latency but low TCP bandwidth because
of this problem) or (b) a long output queue (with high latency and
high TCP bandwidth). In my particular application (ISDN) it took
an output queue equal to ~5 seconds of transmission to avoid ENOBUFS.
A more reasonable output queue of 0.5 seconds resulted in only about
50% TCP throughput. With this patch full throughput was restored in
the latter case.

Reviewed by:	freebsd-net
2000-08-03 23:23:36 +00:00
Ruslan Ermilov
cec335f937 Make netstat(1) to be aware of divert(4) sockets. 2000-08-03 14:09:52 +00:00
Ollivier Robert
c34f578dd2 Change __FreeBSD_Version into the proper __FreeBSD_version.
Submitted by:	Alain.Thivillon@hsc.fr (Alain Thivillon) (for ip_fil.c)
2000-08-01 17:14:38 +00:00
Andrey A. Chernov
a089741a74 Add missing '0' to FreeBSD_version test: 50011 -> 500011 2000-08-01 00:04:24 +00:00
Andrey A. Chernov
c85540dd55 Nonexistent <sys/pfil.h> -> <net/pfil.h>
Kernel 'make depend' fails otherwise
2000-07-31 23:41:47 +00:00
Sheldon Hearn
71845bffc3 Whitespace only:
Fix an overlong line and trailing whitespace that crept in, in the
previous commit.
2000-07-31 13:49:21 +00:00
Darren Reed
c4ac87ea1c activate pfil_hooks and covert ipfilter to use it 2000-07-31 13:11:42 +00:00
Archie Cobbs
642e43b39b Add address translation support for RTSP/RTP used by RealPlayer and
Quicktime streaming media applications.

Add a BUGS section to the man page.

Submitted by:	Erik Salander <erik@whistle.com>
2000-07-26 23:15:46 +00:00
Jayanth Vijayaraghavan
e7f3269307 When a connection is being dropped due to a listen queue overflow,
delete the cloned route that is associated with the connection.
This does not exhaust the routing table memory when the system
is under a SYN flood attack. The route entry is not deleted if there
is any prior information cached in it.

Reviewed by: Peter Wemm,asmodai
2000-07-21 23:26:37 +00:00
Darren Reed
72130853fb fix conflicts 2000-07-19 14:02:09 +00:00
Darren Reed
4dca8a6de1 import ipfilter 3.4.8 2000-07-19 13:57:32 +00:00
Sheldon Hearn
571214d4fe Fix a comment which was broken in rev 1.36.
PR:		19947
Submitted by:	Tetsuya Isaki <isaki@net.ipc.hiroshima-u.ac.jp>
2000-07-18 16:43:29 +00:00
Luigi Rizzo
e07e817462 close PR 19544 - ipfw pipe delete causes panic when no pipes defined
PR: 19544
2000-07-17 20:03:27 +00:00
David Malone
cc72822764 Extra sanity check when arp proxyall is enabled. Don't send an arp
reply if the requesting machine isn't on the interface we believe
it should be. Prevents arp wars when you plug cables in the wrong
way around.

PR:		9848
Submitted by:	Ian Dowse <iedowse@maths.tcd.ie>
Not objected to by:	wollman
2000-07-13 19:31:01 +00:00
Jayanth Vijayaraghavan
7d20010979 re-enable the tcp newreno code. 2000-07-12 22:00:46 +00:00
Jun-ichiro itojun Hagino
f38211642f remove m_pulldown statistics, which is highly experimental and does not
belong to *bsd-merged tree
2000-07-12 16:39:13 +00:00
Jun-ichiro itojun Hagino
b474779f46 be more cautious about tcp option length field. drop bogus ones earlier.
not sure if there is a real threat or not, but it seems that there's
possibility for overrun/underrun (like non-NOP option with optlen > cnt).
2000-07-09 13:01:59 +00:00
Jun-ichiro itojun Hagino
686cdd19b1 sync with kame tree as of july00. tons of bug fixes/improvements.
API changes:
- additional IPv6 ioctls
- IPsec PF_KEY API was changed, it is mandatory to upgrade setkey(8).
  (also syntax change)
2000-07-04 16:35:15 +00:00
Poul-Henning Kamp
77978ab8bc Previous commit changing SYSCTL_HANDLER_ARGS violated KNF.
Pointed out by:	bde
2000-07-04 11:25:35 +00:00
Poul-Henning Kamp
82d9ae4e32 Style police catches up with rev 1.26 of src/sys/sys/sysctl.h:
Sanitize SYSCTL_HANDLER_ARGS so that simplistic tools can grog our
sources:

        -sysctl_vm_zone SYSCTL_HANDLER_ARGS
        +sysctl_vm_zone (SYSCTL_HANDLER_ARGS)
2000-07-03 09:35:31 +00:00
Ruslan Ermilov
36e6576b44 Fixed PunchFWHole():
- ipfw always rejected rule with `neither in nor out' diagnostics.
- number of src/dst ports was not set properly.
2000-06-27 14:56:07 +00:00
Ruslan Ermilov
d15583713a - Removed PacketAliasPptp() API function.
- SHLIB_MAJOR++.
2000-06-20 13:07:52 +00:00
Ruslan Ermilov
55a39fc5a2 Added true support for PPTP aliasing. Some nice features include:
- Multiple PPTP clients behind NAT to the same or different servers.

- Single PPTP server behind NAT -- you just need to redirect TCP
  port 1723 to a local machine.  Multiple servers behind NAT is
  possible but would require a simple API change.

- No API changes!

For more information on how this works see comments at the start of
the alias_pptp.c.

PacketAliasPptp() is no longer necessary and will be removed soon.

Submitted by:	Erik Salander <erik@whistle.com>
Reviewed by:	ru
Rewritten by:	ru
Reviewed by:	Erik Salander <erik@whistle.com>
2000-06-20 11:41:48 +00:00
Alfred Perlstein
a79b71281c return of the accept filter part II
accept filters are now loadable as well as able to be compiled into
the kernel.

two accept filters are provided, one that returns sockets when data
arrives the other when an http request is completed (doesn't work
with 0.9 requests)

Reviewed by: jmg
2000-06-20 01:09:23 +00:00
Ruslan Ermilov
b766604065 - Improved passive mode FTP support by aliasing 229 replies.
- Stricter checking of PORT/EPRT/227/229 messages format.
- Moved all security checks into one place.
2000-06-16 20:36:16 +00:00
Ruslan Ermilov
7652512976 - Added support for passive mode FTP by aliasing 227 replies.
It does mean that it is now possible to run passive-mode FTP
  server behind NAT.

- SECURITY: FTP aliasing engine now ensures that:
  o the segment preceding a PORT/227 segment terminates with a \r\n;
  o the IP address in the PORT/227 matches the source IP address of
    the packet;
  o the port number in the PORT command or 277 reply is greater than
    or equal to 1024.

Submitted by:	Erik Salander <erik@whistle.com>
Reviewed by:	ru
2000-06-14 16:09:35 +00:00
Luigi Rizzo
8a0b95d610 Fix behaviour of "ipfw pipe show" -- previous code gave
ambiguous data to the userland program (kernel operation was
safe, anyways).
2000-06-14 10:07:22 +00:00
Dan Moschuk
9714563d83 Add tcpoptions to ipfw. This works much in the same way as ipoptions do.
It also squashes 99% of packet kiddie synflood orgies.  For example, to
rate syn packets without MSS,

ipfw pipe 10 config 56Kbit/s queue 10Packets
ipfw add pipe 10 tcp from any to any in setup tcpoptions !mss

Submitted by:  Richard A. Steenbergen <ras@e-gerbil.net>
2000-06-08 15:34:51 +00:00
Luigi Rizzo
5d3fe434f8 Implement WF2Q+ in dummynet. 2000-06-08 09:45:23 +00:00
Jonathan Lemon
707d00a304 Add boundary checks against IP options.
Obtained from:	OpenBSD
2000-06-02 20:18:38 +00:00
Jonathan Lemon
59f577ad8c When attempting to transmit a packet, if the system fails to allocate
a mbuf, it may return without setting any timers.  If no more data is
scheduled to be transmitted (this was a FIN) the system will sit in
LAST_ACK state forever.

Thus, when mbuf allocation fails, set the retransmit timer if neither
the retransmit or persist timer is already pending.

Problem discovered by:  Mike Silbersack (silby@silby.com)
Pushed for a fix by:    Bosko Milekic <bmilekic@dsuper.net>
Reviewed by:            jayanth
2000-06-02 17:38:45 +00:00
Darren Reed
6de9811ef7 define CSUM_DELAY_DATA to match merge 2000-05-26 07:28:03 +00:00
Jake Burkholder
e39756439c Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by:		msmith and others
2000-05-26 02:09:24 +00:00
Darren Reed
f1beb78299 fix up #ifdef jungle for FreeBSD 2000-05-25 16:24:46 +00:00
Darren Reed
6774c05335 remove duplicate prototypes 2000-05-25 16:23:30 +00:00
Jonathan Lemon
50c6dc99d8 Mark the checksum as complete when looping back multicast packets.
Submitted by:	Jeff Gibbons <jgibbons@n2.net>
2000-05-25 02:27:14 +00:00
Archie Cobbs
06a429a3c8 Just need to pass the address family to if_simloop(), not the whole sockaddr. 2000-05-24 21:16:56 +00:00
Darren Reed
a4f66d8f4c fix duplicate rcsid's 2000-05-24 19:38:17 +00:00
Bruce Evans
582a77606f Fixed some style bugs (mainly convoluted logic for blackhole processing). 2000-05-24 12:57:52 +00:00
Peter Wemm
ebb6049b1f It would have been nice if this actually compiled. Close the header
comment */.
2000-05-24 09:08:55 +00:00
Darren Reed
6e067727a7 fix up conflicts 2000-05-24 04:40:17 +00:00
Darren Reed
255c925eef fix conflicts 2000-05-24 04:21:35 +00:00
Darren Reed
6dda709260 fix conflicts 2000-05-24 04:09:13 +00:00
Darren Reed
8982edd714 fix conflicts 2000-05-24 04:01:49 +00:00
Darren Reed
d2138b8dd4 fix conflicts 2000-05-24 04:01:30 +00:00
Darren Reed
329247db38 fix conflicts 2000-05-24 03:43:24 +00:00
Darren Reed
fe646be69a fix conflicts 2000-05-24 03:17:16 +00:00
Darren Reed
41d01fd6fa Import IP Filter 3.4.4 into the kernel 2000-05-24 02:55:58 +00:00
Jake Burkholder
740a1973a6 Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by:	phk
Reviewed by:	phk
Approved by:	mdodd
2000-05-23 20:41:01 +00:00
Dan Moschuk
4f14ee00f2 sysctl'ize ICMP_BANDLIM and ICMP_BANDLIM_SUPPRESS_OUTPUT.
Suggested by: des/nbm
2000-05-22 16:12:28 +00:00
Dan Moschuk
fcdc02160f Add option ICMP_BANDLIM_SUPPRESS_OUTPUT to the mix. With this option,
badport_bandlim() will not muck up your console with printf() messages.
2000-05-22 15:00:41 +00:00
Jonathan Lemon
1c23847582 Compute the checksum before handing the packet off to IPFilter.
Tested by:  Cy Schubert <Cy.Schubert@uumail.gov.bc.ca>
2000-05-21 21:26:06 +00:00
Peter Wemm
ff079ca4b1 Return ECONNRESET instead of EINVAL if the connection has been shot
down as a result of a reset.  Returning EINVAL in that case makes no
sense at all and just confuses people as to what happened.  It could be
argued that we should save the original address somewhere so that
getsockname() etc can tell us what it used to be so we know where the
problem connection attempts are coming from.
2000-05-19 00:55:21 +00:00
Jayanth Vijayaraghavan
d841727499 snd_cwnd was updated twice in the tcp_newreno function. 2000-05-18 21:21:42 +00:00
Jayanth Vijayaraghavan
75c6e0e253 Sigh, fix a rookie patch merge error.
Also-missed-by:	peter
2000-05-17 06:55:00 +00:00
Jonathan Lemon
5d5d5fc0bf Cast sizeof() calls to be of type (int) when they appear in a signed
integer expression.  Otherwise the sizeof() call will force the expression
to be evaluated as unsigned, which is not the intended behavior.

Obtained from:  NetBSD   (in a different form)
2000-05-17 04:05:07 +00:00
Jayanth Vijayaraghavan
6b2a5f92ba snd_una was being updated incorrectly, this resulted in the newreno
code retransmitting data from the wrong offset.

As a footnote, the newreno code was partially derived from NetBSD
and Tom Henderson <tomh@cs.berkeley.edu>
2000-05-16 03:13:59 +00:00
Ruslan Ermilov
3a06e3e02c Do not call icmp_error() if ipfirewall(4) denied packet.
PR:		kern/10747, kern/18382
2000-05-15 18:41:01 +00:00
Archie Cobbs
2e2de7f23f Move code to handle BPF and bridging for incoming Ethernet packets out
of the individual drivers and into the common routine ether_input().
Also, remove the (incomplete) hack for matching ethernet headers
in the ip_fw code.

The good news: net result of 1016 lines removed, and this should make
bridging now work with *all* Ethernet drivers.

The bad news: it's nearly impossible to test every driver, especially
for bridging, and I was unable to get much testing help on the mailing
lists.

Reviewed by:	freebsd-net
2000-05-14 02:18:43 +00:00
Jayanth Vijayaraghavan
4aae1da6dd Temporarily turn off the newreno flag until we can track down the known
data corruption problem.
2000-05-11 22:28:28 +00:00
Brian Somers
151682eadc Revert the default behaviour for incoming connections so
that they (once again) go to the target machine rather than
the alias address.

PR:		18354
Submitted by:	ru
2000-05-11 07:52:21 +00:00
Jun-ichiro itojun Hagino
fdcb8debf6 correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.
similar to recent fix to sys/netinet/ipf.c (by darren).
2000-05-10 01:25:33 +00:00
Darren Reed
68b16578ca Fix bug in dealing with "hlen == 1 and opt > 1" 2000-05-09 23:35:24 +00:00
Paul Saab
88c7d46bdc Add missing include machine/in_cksum.h.
Submitted by:	n_hibma
2000-05-09 16:56:51 +00:00
Kenneth D. Merry
62771f86e3 Include machine/in_cksum.h to unbreak options MROUTING. 2000-05-08 23:56:30 +00:00
Jonathan Lemon
72a52a35b4 Add #include <machine/in_cksum.h>, in order to pick up the checksum
inline functions and prototypes.
2000-05-06 18:19:58 +00:00
Jonathan Lemon
46f5848237 Implement TCP NewReno, as documented in RFC 2582. This allows
better recovery for multiple packet losses in a single window.
The algorithm can be toggled via the sysctl net.inet.tcp.newreno,
which defaults to "on".

Submitted by:  Jayanth Vijayaraghavan <jayanth@yahoo-inc.com>
2000-05-06 03:31:09 +00:00
Paul Richards
7a04c4f85a Force the address of the socket to be INADDR_ANY immediately before
calling in_pcbbind so that in_pcbbind sees a valid address if no
address was specified (since divert sockets ignore them).

PR:		17552
Reviewed by:	Brian
2000-05-02 23:53:46 +00:00
Luigi Rizzo
9078405886 Remove an unnecessary error message 2000-05-02 15:39:36 +00:00
Peter Wemm
365c5db0a7 Add $FreeBSD$ 2000-05-01 20:32:07 +00:00
Ruslan Ermilov
8060760500 Replace PacketAliasRedirectPptp() (which had nothing specific
to PPTP) with more generic PacketAliasRedirectProto().

Major number is not bumped because it is believed that noone
has started using PacketAliasRedirectPptp() yet.
2000-04-28 13:44:49 +00:00
Ruslan Ermilov
5e8fc2d2d3 Spell PacketAliasRedirectAddr() correctly. 2000-04-27 18:06:05 +00:00
Ruslan Ermilov
6d20a77450 Load Sharing using IP Network Address Translation (RFC 2391, LSNAT).
LSNAT links are first created by either PacketAliasRedirectPort() or
PacketAliasRedirectAddress() and then set up by one or more calls to
PacketAliasAddServer().
2000-04-27 17:37:03 +00:00
Yoshinobu Inoue
34e3e010b6 Let initialize th_sum before in6_cksum(), again.
Without this fix, all IPv6 TCP RST packet has wrong cksum value,
so IPv6 connect() trial to 5.0 machine won't fail until tcp connect timeout,
when they should fail soon.

Thanks to haro@tk.kubota.co.jp (Munehiro Matsuda) for his much debugging
help and detailed info.
2000-04-19 15:05:00 +00:00
Poul-Henning Kamp
3389ae9350 Remove ~25 unneeded #include <sys/conf.h>
Remove ~60 unneeded #include <sys/malloc.h>
2000-04-19 14:58:28 +00:00
Ruslan Ermilov
483d2f2296 Add support for multiple PPTP sessions:
- new API function: PacketAliasRedirectPptp()
- new mode bit: PKT_ALIAS_DENY_PPTP

Please see manual page for details.
2000-04-18 10:18:21 +00:00
Munechika SUMIKAWA
5e0ab69d23 ND6_HINT() should not be called unless the connection status is
ESTABLISHED.

Obtained from:	KAME Project
2000-04-17 20:27:02 +00:00
Ruslan Ermilov
b5e819ec23 Apply TCP_EXPIRE_CONNECTED (86400 seconds) timeout only to established
connections, after SYN packets were seen from both ends.  Before this,
it would get applied right after the first SYN packet was seen (either
from client or server).  With broken TCP connection attempts, when the
remote end does not respond with SYNACK nor with RST, this resulted in
having a useless (ie, no actual TCP connection associated with it) TCP
link with 86400 seconds TTL, wasting system memory.  With high rate of
such broken connection attempts (for example, remote end simply blocks
these connection attempts with ipfw(8) without sending RST back), this
could result in a denial-of-service.

PR:		bin/17963
2000-04-14 15:34:55 +00:00
Ruslan Ermilov
a29006665c A complete reformatting of manual page. 2000-04-13 14:04:01 +00:00
Ruslan Ermilov
f167e54283 Make partially specified permanent links without `dst_addr'
but with `dst_port' work for outgoing packets.

This case was not handled properly when I first fixed this
in revision 1.17.

This change is also required for the upcoming improved PPTP
support patches -- that is how I found the problem.

Before this change:

# natd -v -a aliasIP \
  -redirect_port tcp localIP:localPORT publicIP:publicPORT 0:remotePORT

Out [TCP]  [TCP] localIP:localPORT -> remoteIP:remotePORT aliased to
           [TCP] aliasIP:localPORT -> remoteIP:remotePORT

After this change:

# natd -v -a aliasIP \
  -redirect_port tcp localIP:localPORT publicIP:publicPORT 0:remotePORT

Out [TCP]  [TCP] localIP:localPORT -> remoteIP:remotePORT aliased to
           [TCP] publicIP:publicPORT -> remoteIP:remotePORT
2000-04-12 18:44:50 +00:00
Wes Peters
732f6a4376 PR: kern/17872
Submitted by:	csg@waterspout.com (C. Stephen Gunn)
2000-04-11 06:55:09 +00:00
Ruslan Ermilov
67b333b7e4 - Add support for FTP EPRT (RFC 2428) command.
- Minor optimizations.
- Minor spelling fixes.

PR:		14305
Submitted by:	ume
Rewritten by:	ru
2000-04-06 15:54:52 +00:00
Ruslan Ermilov
680c8244a9 - Remove unused includes.
- Minor spelling fixes.
- Make IcmpAliasOut2() really work.

Before this change:

# natd -v -n PUB_IFACE -p 12345 -redirect_address 192.168.1.1 P.P.P.P
natd[87923]: Aliasing to A.A.A.A, mtu 1500 bytes
In  [UDP]  [UDP] X.X.X.X:49562 -> P.P.P.P:50000 aliased to
           [UDP] X.X.X.X:49562 -> 192.168.1.1:50000
Out [ICMP] [ICMP] 192.168.1.1 -> X.X.X.X 3(3) aliased to
           [ICMP] A.A.A.A -> X.X.X.X 3(3)

# tcpdump -n -t -i PUB_IFACE host X.X.X.X and "(udp or icmp)"
tcpdump: listening on PUB_IFACE
X.X.X.X.49562 > P.P.P.P.50000: udp 3
A.A.A.A > X.X.X.X: icmp: A.A.A.A udp port 50000 unreachable

After this change:

# natd -v -n PUB_IFACE -p 12345 -redirect_address 192.168.1.1 P.P.P.P
natd[89360]: Aliasing to A.A.A.A, mtu 1500 bytes
In  [UDP]  [UDP] X.X.X.X:49563 -> P.P.P.P:50000 aliased to
           [UDP] X.X.X.X:49563 -> 192.168.1.1:50000
Out [ICMP] [ICMP] 192.168.1.1 -> X.X.X.X 3(3) aliased to
           [ICMP] P.P.P.P -> X.X.X.X 3(3)

# tcpdump -n -t -i PUB_IFACE host X.X.X.X and "(udp or icmp)"
tcpdump: listening on PUB_IFACE
X.X.X.X.49563 > P.P.P.P.50000: udp 3
P.P.P.P > X.X.X.X: icmp: P.P.P.P udp port 50000 unreachable
2000-04-05 14:27:34 +00:00
Ruslan Ermilov
79eef4b611 - Moved NULL definition into private include file.
- Minor spelling fixes.
2000-04-05 14:23:42 +00:00
Ruslan Ermilov
91cc2995af Minor spelling fixes. 2000-04-05 07:45:39 +00:00
Brian Somers
3e6ac25bd6 Correct Charles Mott's email address
Requested by: Charles Mott <cmott@scientech.com>
2000-04-02 20:16:45 +00:00
Yoshinobu Inoue
7cba257ae5 Move htons() ip_len to after the in_delayed_cksum() call.
This should stop cksum error messages on IPsec communication
which was reported on freebsd-current.

Reviewed by: jlemon
2000-04-02 16:18:26 +00:00
Paul Saab
75daea9395 Try and make the kernel build again without INET6. 2000-04-02 03:49:25 +00:00
Yoshinobu Inoue
fdaf052eb3 Support per socket based IPv4 mapped IPv6 addr enable/disable control.
Submitted by: ume
2000-04-01 22:35:47 +00:00
Jonathan Lemon
ea53ecd9d4 Calculate any delayed checksums before handing an mbuf off to a
divert socket.  This fixes a problem with ppp/natd.

Reviewed by:	bsd	(Brian Dean, gotta love that login name)
2000-04-01 18:51:03 +00:00
Brian Somers
5dd44916b7 Allow PacketAliasSetTarget() to be passed the following:
INADDR_NONE:   Incoming packets go to the alias address (the default)
  INADDR_ANY:    Incoming packets are not NAT'd (direct access to the
                 internal network from outside)
  anything else: Incoming packets go to the specified address

Change a few inaddr::s_addr == 0 to inaddr::s_addr == INADDR_ANY
while I'm there.
2000-03-31 20:36:29 +00:00
Brian Somers
1c4e6d2544 When an incoming packet is received that is not specifically
redirected and when no target address has been specified, NAT
the destination address to the alias address rather than
allowing people direct access to your internal network from
outside.
2000-03-31 14:03:37 +00:00
Jonathan Lemon
20c822f399 If `ipfw fwd' loops an mbuf back to ip_input from ip_output and the
mbuf is marked for delayed checksums, then additionally mark the
packet as having it's checksums computed.  This allows us to bypass
computing/checking the checksum entirely, which isn't really needeed
as the packet has never hit the wire.

Reviewed by:		green
2000-03-30 02:16:40 +00:00
Joerg Wunsch
f72d9d83f5 Peter Johnson found another log() call without a trailing newline.
All three of them have been introduced in rev 1.64, so i guess i've
got all of them now. :)

Submitted by:	Peter Johnson <locke@mcs.net>
2000-03-29 07:50:39 +00:00
Joerg Wunsch
e44d62832c Added two missing newlines in calls to log(9).
Reported in Usenet by: locke@mcs.net (Peter Johnson)

While i was at it, prepended a 0x to the %D output, to make it clear that
the printed value is in hex (i assume %D has been chosen over %#x to
obey network byte order).
2000-03-28 21:14:35 +00:00
Jonathan Lemon
db4f9cc703 Add support for offloading IP/TCP/UDP checksums to NIC hardware which
supports them.
2000-03-27 19:14:27 +00:00
Matthew Dillon
84365e2bcb Fix parens in m_pullup() line in arp handling code. The code was
improperly doing the equivalent of (m = (function() == NULL)) instead
    of ((m = function()) == NULL).

    This fixes a NULL pointer dereference panic with runt arp packets.
2000-03-23 18:58:59 +00:00
Brian Feldman
333aa64d05 in6_pcb.c:
Remove a bogus (redundant, just weird, etc.) key_freeso(so).
	There are no consumers of it now, nor does it seem there
	ever will be.

in6?_pcb.c:
	Add an if (inp->in6?p_sp != NULL) before the call to
	ipsec[46]_delete_pcbpolicy(inp).  In low-memory conditions
	this can cause a crash because in6?_sp can be NULL...
2000-03-22 02:27:30 +00:00
Larry Lile
b149dd6c66 o Replace most magic numbers related to token ring with #defines
from iso88025.h.

o Add minimal llc support to iso88025_input.

o Clean up most of the source routing code.

* Submitted by: Nikolai Saoukh <nms@otdel-1.org>
2000-03-19 21:34:39 +00:00
Brian Somers
9da582e318 Make _FindLinkIn() static and only define GetDestPort when
NO_FW_PUNCH isn't defined.
2000-03-19 09:11:05 +00:00
Ruslan Ermilov
d137714f11 Fix reporting of src and dst IP addresses for ICMP and generic IP packets.
PR:		17319
Submitted by:	Mike Heffner <spock@techfour.net>
2000-03-14 14:11:53 +00:00
Yoshinobu Inoue
820b57927e Disable IPv4 over IPv4 tunnel on the 6to4 interface for better security.
Approved by: jkh
2000-03-11 22:11:57 +00:00
Yoshinobu Inoue
4739b8076f IPv6 6to4 support.
Now most big problem of IPv6 is getting IPv6 address
   assignment.
   6to4 solve the problem. 6to4 addr is defined like below,

          2002: 4byte v4 addr : 2byte SLA ID : 8byte interface ID

   The most important point of the address format is that an IPv4 addr
   is embeded in it. So any user who has IPv4 addr can get IPv6 address
   block with 2byte subnet space. Also, the IPv4 addr is used for
   semi-automatic IPv6 over IPv4 tunneling.

   With 6to4, getting IPv6 addr become dramatically easy.
   The attached patch enable 6to4 extension, and confirmed to work,
   between "Richard Seaman, Jr." <dick@tar.com> and me.

Approved by: jkh

Reviewed by: itojun
2000-03-11 11:17:24 +00:00
Robert Watson
76ec7b2f60 The function arpintr() incorrectly checks m->m_len to detect incomplete
ARP packets. This can incorrectly reject complete frames since the frame
could be stored in more than one mbuf.

The following patches fix the length comparisson, and add several
diagnostic log messages to the interrupt handler for out-of-the-norm ARP
packets. This should make ARP problems easier to detect, diagnose and
fix.

Submitted by:	C. Stephen Gunn <csg@waterspout.com>
Approved by:	jkh
Reviewed by:	rwatson
2000-03-11 00:24:29 +00:00
Yoshinobu Inoue
f63e7634ac Initialize mbuf pointer at getting ipsec policy.
Without this, kernel will panic at getsockopt() of IPSEC_POLICY.
Also make compilable libipsec/test-policy.c which tries getsockopt() of
IPSEC_POLICY.

Approved by: jkh

Submitted by: sakane@kame.net
2000-03-09 14:57:16 +00:00
Sheldon Hearn
c6ff3a1bf7 Remove single-space hard sentence breaks. These degrade the quality
of the typeset output, tend to make diffs harder to read and provide
bad examples for new-comers to mdoc.
2000-03-02 09:14:21 +00:00
Luigi Rizzo
da3fc682a7 Fix panic when doing keep-state and "forward".
Removed a redundant check.
Also move check for expired rules before using them.
Sorry for the whitespace changes.

Approved-by: jordan
2000-02-29 17:51:25 +00:00
Paul Saab
f885f63606 Limit the maximum permissible TCP window size to 65535 octets if
window scaling is disabled.

PR:		kern/16914
Submitted by:	Jayanth Vijayaraghavan <jayanth@yahoo-inc.com>
Reviewed by:	wollman
Approved by:	jkh
2000-02-28 21:18:21 +00:00
Alfred Perlstein
47a11f04b3 -it do, among other things, clear out any
+it does, amongst other things, clear out any

The old sentance didn't seem to make sense.
2000-02-28 00:31:18 +00:00
Guido van Rooij
6d37c73e26 Remove option IPFILTER_KLD. In case you wanted to kldload ipfilter,
the module would only work in kernels built with this option.

Approved by:	jkh
2000-02-23 20:11:57 +00:00
Peter Wemm
242c5536ea Clean up some loose ends in the network code, including the X.25 and ISO
#ifdefs.  Clean out unused netisr's and leftover netisr linker set gunk.
Tested on x86 and alpha, including world.

Approved by:	jkh
2000-02-13 03:32:07 +00:00
Luigi Rizzo
f42c0d553a Forgot one line: don't try to match flags when looking for a flow.
Approved-by: jordan
2000-02-11 13:23:14 +00:00
Guido van Rooij
a993ad154a Re add rev 1.11 diffs to ip_fil.h Also discover that I did not undefine
CVS_FUBAR (which no longer exists) and thus forgot to add $FreeBSD's.
Add them.

Approved by: jkh (is part of ipfilter upgrade)
2000-02-10 21:29:11 +00:00
Yoshinobu Inoue
1aa540eb03 Forbid include of soem inet6 header files from wrong place
KAME put INET6 related stuff into sys/netinet6 dir, but IPv6
  standard API(RFC2553) require following files to be under sys/netinet.
    netinet/ip6.h
    netinet/icmp6.h
  Now those header files just include each following files.
    netinet6/ip6.h
    netinet6/icmp6.h

  Also KAME has netinet6/in6.h for easy INET6 common defs
  sharing between different BSDs, but RFC2553 requires only
  netinet/in.h should be included from userland.
  So netinet/in.h also includes netinet6/in6.h inside.

  To keep apps portability, apps should not directly include
  above files from netinet6 dir.
  Ideally, all contents of,
    netinet6/ip6.h
    netinet6/icmp6.h
    netinet6/in6.h
  should be moved into
    netinet/ip6.h
    netinet/icmp6.h
    netinet/in.h
  but to avoid big changes in this stage, add some hack, that
    -Put some special macro define into those files under neitnet
    -Let files under netinet6 cause error if it is included
     from some apps, and, if the specifal macro define is not
     defined.
     (which should have been defined if files under netinet is
     included)
    -And let them print an error message which tells the
     correct name of the include file to be included.

  Also fix apps which includes invalid header files.

Approved by: jkh

Obtained from: KAME project
2000-02-10 19:33:58 +00:00
Luigi Rizzo
9fcc079584 Move definition of fw_enable from ip_fw.c to ip_input.c
so we can compile kernels without IPFIREWALL .

Reported-by: Robert Watson
Approved-by: jordan
2000-02-10 17:56:01 +00:00
Luigi Rizzo
6355710df8 Whoops... forgot braces in a conditional
Revealed-by: diff with -STABLE version (the advantage of having
    multiple lines of development...)
Approved-by: jordan
2000-02-10 16:50:53 +00:00
Luigi Rizzo
6bc748b057 Support the net.inet.ip.fw.enable variable, part of
the recent ipfw modifications.

Approved-by: jordan
2000-02-10 14:19:53 +00:00
Luigi Rizzo
03c612662b Support for stateful (dynamic) ipfw rules. They are very
similar to ipfilter's keep-state.

Look at the updated ipfw(8) manpage for details.

Approved-by: jordan
2000-02-10 14:17:40 +00:00
Guido van Rooij
6cd756a2b5 Bring over ipfilter v3_3_8 kernel sources, including merging the
local modifications.
Also fix initializing fr_running in KLD case.
Rename ipl_inited to fr_runninhg in mlfk_ipl

Approved by: jkh
2000-02-09 20:56:36 +00:00
Yoshinobu Inoue
a683a7dd4f Avoid kernel panic when tcp rfc1323 and rfc1644 options are enabled
at the same time.

   When rfc1323 and rfc1644 option are enabled by sysctl,
   and tcp over IPv6 is tried, kernel panic happens by the
   following check in tcp_output(), because now hdrlen is bigger
   in such case than before.

/*#ifdef DIAGNOSTIC*/
        if (max_linkhdr + hdrlen > MHLEN)
                panic("tcphdr too big");
/*#endif*/

   So change the above check to compare with MCLBYTES in #ifdef INET6 case.
   Also, allocate a mbuf cluster for the header mbuf, in that case.

Bug reported at KAME environment.
Approved by: jkh

Reviewed by: sumikawa
Obtained from: KAME project
2000-02-09 00:34:40 +00:00
Luigi Rizzo
9fbf0caaac Fix a (mostly harmless) scheduling-in-the-past problem with
dummynet (already fixed in -stable, was waiting for Jordan's
approval due to the code freeze).

Reported-By: Mike Tancsa
Approved-By: Jordan
2000-02-04 16:45:33 +00:00
Archie Cobbs
b12cbc348c The flags PKT_ALIAS_PUNCH_FW and PKT_ALIAS_PROXY_ONLY were both
being defined as 0x40.  Change the former to be 0x100.

Submitted by:	Erik Salander <erik@whistle.com>
Approved by:	jkh
2000-02-02 23:49:32 +00:00
Brian Somers
21b9df573d Mention what PKT_ALIAS_PROXY_ONLY does.
Prompted by: archie
2000-02-02 23:42:06 +00:00
Yoshinobu Inoue
ae8d522734 Sorry in this just befor code freeze commit.
This is fix to usr.sbin/trpt and tcp_debug.[ch]
I think of putting this after 4.0 but,,,

 -There was bug that when INET6 is defined,
  IPv4 socket is not traced by trpt.

 -I received request from a person who distribute a program
  which use tcp_debug interface and print performance statistics,
  that
    -leave comptibility with old program as much as possible
    -use same interface with other OSes

  So, I talked with itojun, and synced API with netbsd IPv6 extension.

makeworld check, kernel build check(includes GENERIC) is done.

But if there happen to any problem, please let me know and
I soon backout this change.
2000-01-29 11:49:07 +00:00
Warner Losh
173c0f9f5c Mitigate the stream.c attacks
o Drop all broadcast and multicast source addresses in tcp_input.
o Enable ICMP_BANDLIM in GENERIC.
o Change default to 200/s from 100/s.  This will still stop the attack, but
  is conservative enough to do this close to code freeze.

This is not the optimal patch for the problem, but is likely the least
intrusive patch that can be made for this.

Obtained from: Don Lewis and Matt Dillon.
Reviewed by: freebsd-security
2000-01-28 06:13:09 +00:00
Yoshinobu Inoue
69a3468578 Avoid m_len and m_pkthdr.len inconsistency when changing m_len
for an mbuf whose M_PKTHDR is set.

PR: related to kern/15175
Reviewed by: archie
2000-01-25 01:26:47 +00:00
Yoshinobu Inoue
e2de10abe4 Fix the bug that IPv4 ttl is not initialized when AF_INET6 socket is used
for IPv4 communication.(IPv4 mapped IPv6 addr.)
Also removed IPv6 hoplimit initialization because it is alway done at
tcp_output.

Confirmed by: Bernd Walter <ticso@cicely5.cicely.de>
2000-01-25 01:05:18 +00:00
Brian Somers
367d34f853 Move the *intrq variables into net/intrq.c and unconditionally
include this in all kernels.  Declare some const *intrq_present
variables that can be checked by a module prior to using *intrq
to queue data.

Make the if_tun module capable of processing atm, ip, ip6, ipx,
natm and netatalk packets when TUNSIFHEAD is ioctl()d on.

Review not required by: freebsd-hackers
2000-01-24 20:39:02 +00:00
Yoshinobu Inoue
3a2a9f7976 Fixed the problem that IPsec connection hangs when bigger data is sent.
-opt_ipsec.h was missing on some tcp files (sorry for basic mistake)
  -made buildable as above fix
  -also added some missing IPv4 mapped IPv6 addr consideration into
   ipsec4_getpolicybysock
2000-01-15 14:56:38 +00:00
Yoshinobu Inoue
0567ae029a Added missing 'else' for 'if (isipv6)' at IPv6 length setting in tcp_respond().
By this bug, IPv6 reset was not sent.
(I checked around same kind of bug, but no other found.)
2000-01-15 14:34:56 +00:00
Yoshinobu Inoue
595ad28111 Removed wrong(unnecessary) & operators for pointer, in ipsec_hdrsiz_tcp().
This must be one of the reason why connections over IPsec hangs for
bigger packets.(which was reported on freebsd-current@freebsd.org)

But there still seems to be another bug and the problem is not yet fixed.
2000-01-15 05:39:28 +00:00
Yoshinobu Inoue
bb913f0f78 add forward declarations, and small cosmetic changes.
Submitted by: bde
2000-01-15 05:20:40 +00:00
Guido van Rooij
b33271069e Apply patches in rev 1.2 and 1.9 that I forgot
Pointe out by:	bde
2000-01-14 19:48:42 +00:00
Rodney W. Grimes
d05257b0f2 Replace beforeinstall target with new variables used by .mk system.
Reviewed by:	marcel, and make world
2000-01-14 07:57:47 +00:00
Guido van Rooij
0f9f7e9f23 Bring over ipfilter kernel sources, including merging the local modifications. 2000-01-13 19:01:33 +00:00
Yoshinobu Inoue
5d60ed0e69 Change struct sockaddr_storage member name, because following change
is very likely to become consensus as recent ietf/ipng mailing list
discussion. Also recent KAME repository and other KAME patched BSDs
also applied it.

  s/__ss_family/ss_family/
  s/__ss_len/ss_len/

Makeworld is confirmed, and no application should be affected by this change
yet.
2000-01-13 14:52:53 +00:00
Yoshinobu Inoue
21ab895ff5 Clear rt after RTFREE. This might have sometime caused kernel panic at rtfree()
on INET6 enabled environment.
2000-01-13 14:21:30 +00:00
Yoshinobu Inoue
8972cdb14e add a comment for some possible? IPv4 option processing. 2000-01-13 05:21:05 +00:00
Yoshinobu Inoue
72e174286c removed incorrect ip6 length setting for IPv6 tcp reset packet. 2000-01-13 05:18:30 +00:00
Ruslan Ermilov
5db1e34ea4 MGETHDR() does not initialize m_pkthdr.rcvif, do it here.
This fixes page fault panic observed when diverting packets
with IP options (e.g. ping -R remoteIP over natd).

PR:	kern/8596, kern/11199
2000-01-10 18:46:05 +00:00
Yoshinobu Inoue
fb59c426ff tcp updates to support IPv6.
also a small patch to sys/nfs/nfs_socket.c, as max_hdr size change.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project
2000-01-09 19:17:30 +00:00
Yoshinobu Inoue
d0a98d79d2 enable IPsec over DUMMYNET again
Submitted by: luigi
Reviewed by: luigi
2000-01-09 03:06:28 +00:00
Yoshinobu Inoue
0ba9128b0c prevent kernel panic which happens when either of IPSEC and IPDIVERT
is enabled.

Confirmed by: Eugene M. Kim <ab@astralblue.com>
2000-01-08 12:53:48 +00:00
Luigi Rizzo
ec8fac2acf Add ipfw hooks for the new dummynet features.
Support masks on TCP/UDP ports.

Minor cleanup of ip_fw_chk() to avoid repeated calls to PULLUP_TO
at each rule.
2000-01-08 11:31:43 +00:00
Luigi Rizzo
d1f04b29f0 Cleanup dummynet call interface so it should now work on the Alpha
as well. Also (probably) fix a bug introduced during the IPv6 import.
2000-01-08 11:28:23 +00:00
Luigi Rizzo
988790bfd9 Implement per-flow queueing. Using a single pipe config rule,
now you can dynamically create rate-limited queues for different
flows using masks on dst/src IP, port and protocols.
Read the ipfw(8) manpage for details and examples.

Restructure the internals of the traffic shaper to use heaps,
so that it manages efficiently large number of queues.

Fix a bug which was present in the previous versions which could
cause, under certain unfrequent conditions, to send out very large
bursts of traffic.

All in all, this new code is much cleaner than the previous one and
should also perform better.

Work supported by Akamba Corp.
2000-01-08 11:24:46 +00:00
Eivind Eklund
3418fe8734 KERNEL -> _KERNEL 2000-01-05 16:25:20 +00:00
Peter Wemm
664a31e496 Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot).  This is consistant with the other
BSD's who made this change quite some time ago.  More commits to come.
1999-12-29 04:46:21 +00:00
Mike Smith
052a6aad2c Make tcp_drain() actually do something. When invoked (usually as a
desperation measure in low-memory situations), walk the tcpbs and
flush the reassembly queues.

This behaviour is currently controlled by the debug.do_tcpdrain sysctl
(defaults to on).

Submitted by:	Bosko Milekic <bmilekic@dsuper.net>
Reviewed by:	wollman
1999-12-28 23:18:33 +00:00
Yoshinobu Inoue
6a800098cc IPSEC support in the kernel.
pr_input() routines prototype is also changed to support IPSEC and IPV6
chained protocol headers.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project
1999-12-22 19:13:38 +00:00
Eivind Eklund
369dc8ceb8 Change incorrect NULLs to 0s 1999-12-21 11:14:12 +00:00
Peter Wemm
bbc6383b4a The ipfilter module name wasn't exactly conventional.. 1999-12-20 15:49:38 +00:00
Brian Feldman
d25f3712b7 M_PREPEND-related cleanups (unregisterifying struct mbuf *s). 1999-12-19 01:55:37 +00:00
Jonathan Lemon
c0f7fd5575 Use SEQ_* macros for comparing sequence space numbers.
Reviewed by:	truckman
1999-12-14 15:43:56 +00:00
Yoshinobu Inoue
79ea3cf110 Always set INP_IPV4 flag for IPv4 pcb entries, because netstat needs it
to print out protocol specific pcb info.

A patch submitted by guido@gvr.org, and asmodai@wxs.nl also reported
the problem.
Thanks and sorry for your troubles.

Submitted by: guido@gvr.org
Reviewed by: shin
1999-12-13 00:39:20 +00:00
Jonathan Lemon
1a244a616d According to RFC 793, a reset should be honored if the sequence number
is within the receive window.  Follow this behavior, instead of only
allowing resets at last_ack_sent.

Pointed out by:	jayanth@yahoo-inc.com
1999-12-11 04:05:52 +00:00
Archie Cobbs
a9405f0edb Fix a '&&' that should have been a '&'.
Submitted by:	Erik Salander <erik@whistle.com>
1999-12-10 20:04:53 +00:00
Archie Cobbs
61989d76a5 Fix several typos.
Submitted by:	Erik Salander <erik@whistle.com>
1999-12-09 21:36:34 +00:00
Yoshinobu Inoue
f51ff7fadd Make this buildable with MROUTING defined.
Specified by: eivind, phk
1999-12-08 11:57:36 +00:00
Yoshinobu Inoue
cfa1ca9dfa udp IPv6 support, IPv6/IPv4 tunneling support in kernel,
packet divert at kernel for IPv6/IPv4 translater daemon

This includes queue related patch submitted by jburkhol@home.com.

Submitted by: queue related patch from jburkhol@home.com
Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project
1999-12-07 17:39:16 +00:00
Guido van Rooij
ba5d8fc721 Last minute patch that I forgot to apply: check return code of iplattach() 1999-12-06 21:21:47 +00:00
cvs2svn
4478e3fbfb This commit was manufactured by cvs2svn to create branch 'DARRENR'. 1999-12-06 20:36:51 +00:00
Guido van Rooij
05ec607970 Revive mlfk_ipl here. This version is slightly changed from
the old one: an unnecessary define (KLD_MODULE) has been deleted and
the initialisation of the module is  done after domaininit was called
to be sure inet is running.

Some slight changed were made to ip_auth.c and ip_state.c in order
to assure including of sys/systm.h in case we make a kld

Make sure ip_fil does nmot include osreldate in kernel mode

Remove mlfk_ipl.c from here: no sources allowed in these directories!
1999-12-06 20:36:50 +00:00
Archie Cobbs
8948e4ba8e Miscellaneous fixes/cleanups relating to ipfw and divert(4):
- Implement 'ipfw tee' (finally)
- Divert packets by calling new function divert_packet() directly instead
  of going through protosw[].
- Replace kludgey global variable 'ip_divert_port' with a function parameter
  to divert_packet()
- Replace kludgey global variable 'frag_divert_port' with a function parameter
  to ip_reass()
- style(9) fixes

Reviewed by:	julian, green
1999-12-06 00:43:07 +00:00
Jonathan Lemon
c0a929b430 Change the delayed ack time from 200ms to 100ms.
This results in closer behavior to earlier versions, where the fixed
200ms timer actually resulted in a delay anywhere from 1..200ms, with
the average delay being 100ms.

Pointed out by:	 dg
1999-12-02 03:25:19 +00:00
Luigi Rizzo
b73ccac130 RTFREE the correct route entry in dummynet_io(). The previous
code failed in handling things like "forward" actions.

Reported-and-tested-by: Jean-Hugues ROYER jhroyer@joher.com
1999-11-26 13:37:09 +00:00
Guido van Rooij
47e3b8ce13 Get rid of useless osreldate include for KLD/LKM modules (sys/param.h
already carries what is needed).
This is needed for the KLD support.
1999-11-23 22:16:41 +00:00
Guido van Rooij
9cc86ee930 Add kernel parts of revived ipfilter (3.3.3.) 1999-11-23 21:44:59 +00:00
Yoshinobu Inoue
82cd038d51 KAME netinet6 basic part(no IPsec,no V6 Multicast Forwarding, no UDP/TCP
for IPv6 yet)

With this patch, you can assigne IPv6 addr automatically, and can reply to
IPv6 ping.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project
1999-11-22 02:45:11 +00:00
Peter Wemm
45d3a132c4 Fix a warning and a potential panic if TCPDEBUG is active. (tp is
a wild pointer and used by TCPDEBUG2())
1999-11-18 08:28:24 +00:00
Poul-Henning Kamp
12b4fd063c The logic for blackhole processing does not free mbufs if the
blackhole flag is set.

PR:		14958
Submitted by:	Larry Baird <lab@gta.com>
Reviewed by:	phk
1999-11-17 20:57:49 +00:00
Jonathan M. Bresler
4095364274 add two more codes to ICMP error 12 (Parameter Problem).
these two are detailed in RFC1700.

Reviewed by:	Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
1999-11-15 18:27:30 +00:00
Alexey Zelkin
1855100f8f Restore sub-chapters order.
PR:		docs/14766
Submitted by:	Kazutoshi Kubota <kazu@iworks.co.jp>
1999-11-09 00:24:09 +00:00
Jonathan Lemon
5dc0f70d74 Undo rev 1.10, which took out TH_FIN from the CLOSING state. This
breaks simultaneous closes.
1999-11-07 04:18:30 +00:00
Yoshinobu Inoue
76429de41a KAME related header files additions and merges.
(only those which don't affect c source files so much)

Reviewed by: cvs-committers
Obtained from: KAME project
1999-11-05 14:41:39 +00:00
Sheldon Hearn
0b757633b3 Append missing newline to log() message for permanent ARP modification
attempt warning, which was added in rev 1.48 .

PR:	14371
Submitted by:	sec@pi.musin.de (Stefan `Sec` Zehl)
1999-10-18 11:56:50 +00:00
Peter Wemm
088f7c5d38 Nuke the old antique copy of ipfilter from the tree. This is old enough
to be dangerous.  It will better serve us as a port building a KLD,
ala SKIP.

The hooks are staying although it would be better to port and use
the NetBSD pfil interface rather than have custom hooks.
1999-10-10 15:09:59 +00:00
Brian Feldman
ecf723083f Implement RLIMIT_SBSIZE in the kernel. This is a per-uid sockbuf total
usage limit.
1999-10-09 20:42:17 +00:00
Ruslan Ermilov
838d9af2c8 Properly handle the case when either the aliasing or source address of
the link are equal to the default aliasing address.  Do not zero them!

This will fix the problem with non-working links added with the source
and/or aliasing address equal to the default aliasing address, but the
default aliasing address is set later, after the link has been set up,
like both natd(8) and ppp(8) do (for objective reasons).

Reviewed by:	Brian Somers <brian@FreeBSD.org>,
		Eivind Eklund <eivind@FreeBSD.org>,
		Charles Mott <cmott@srv.net>
1999-09-27 08:40:36 +00:00
Poul-Henning Kamp
d6a0e38a1b Remove five now unused fields from struct cdevsw. They should never
have been there in the first place.  A GENERIC kernel shrinks almost 1k.

Add a slightly different safetybelt under nostop for tty drivers.

Add some missing FreeBSD tags
1999-09-25 18:24:47 +00:00
Ruslan Ermilov
bd3ed4542d ReLink() partial links in FindLinkOut() in the same manner as we do it
in FindLinkIn().  This will make TcpMonitorIn()/TcpMonitorOut() happy.

Reviewed by:	eivind
1999-09-22 13:22:26 +00:00
Ruslan Ermilov
f3baa77e5e Restore previous version of FindLinkIn().
Instead, natd(8) should be fixed to call PacketAliasSetAddress()
as part of initialization, as required by libalias(3).
1999-09-21 14:44:32 +00:00
Ruslan Ermilov
02136bf8b0 - Make partially specified permanent links (without `dst_addr' and/or
`dst_port') work for outgoing packets.

- Make permanent links whose `alias_addr' matches the primary aliasing
  address `aliasAddress' work for incoming packets.

- Typo fixes.

Reviewed by:	brian, eivind
1999-09-21 08:40:20 +00:00
Brian Somers
32277d8b6d sys/errno.h -> errno.h 1999-09-21 01:26:49 +00:00
Brian Feldman
2f9a21326c Change so_cred's type to a ucred, not a pcred. THis makes more sense, actually.
Make a sonewconn3() which takes an extra argument (proc) so new sockets created
with sonewconn() from a user's system call get the correct credentials, not
just the parent's credentials.
1999-09-19 02:17:02 +00:00
Larry Lile
f9083fdb2a Re-arrange the arp code so that fddi arps work properly. 1999-09-16 00:35:39 +00:00
Dag-Erling Smørgrav
6c3b5f69ba Reorder. 1999-09-14 16:40:28 +00:00
Dag-Erling Smørgrav
f861330504 Fix some more disordering, as well as the description string for the
net.inet.tcp.drop_synfin sysctl, which for some mysterious reason said
"Drop TCP packets with FIN+ACK set" (instead of "...with SYN+FIN set")
1999-09-14 16:14:05 +00:00
Dag-Erling Smørgrav
e46cd3d4d2 Add the net.inet.tcp.restrict_rst and net.inet.tcp.drop_synfin sysctl
variables, conditional on the TCP_RESTRICT_RST and TCP_DROP_SYNFIN kernel
options, respectively. See the comments in LINT for details.
1999-09-12 17:22:08 +00:00
Ruslan Ermilov
92da29a00d - Optimization to the previous (rev 1.15) commit.
Requested by:	eivind
Discussed with:	eivind
Reviewed by:	brian, eivind
1999-09-10 15:27:34 +00:00
Ruslan Ermilov
29d958bb8a Handle TCP reset sequence properly.
In the words of originator:
:If an incoming connection is initiated through natd and deny_incoming is
:not set, then a new alias_link structure is created to handle the link.
:If there is nothing listening for the incoming connection, then the kernel
:responds with a RST for the connection. However, this is not processed
:correctly in libalias/alias.c:TcpMonitor{In,Out} and
:libalias/alias_db.c:SetState{In,Out} as it thinks a connection
:has been established and therefore applies a timeout of 86400 seconds
:to the link.
:
:If many of these half-connections are initiated (during, for example, a
:port scan of the host), then many thousands of unnecessary links are
:created and the resident size of natd balloons to 20MB or more.

PR:		13639
Reviewed by:	brian
1999-09-09 13:42:51 +00:00
Ruslan Ermilov
2f89696765 Fix typo. 1999-09-08 16:37:14 +00:00
Jonathan Lemon
9fc2bcf662 Simplify, and return an error if the user attempts to set a TCP
time value which results in < 1 tick.

Suggested by: 	bde
1999-08-31 16:34:20 +00:00
Jonathan Lemon
9987d77844 Remove conversion macros that were used during development. 1999-08-31 16:31:07 +00:00
Jonathan Lemon
ccb4d0c653 Add a SYSCTL_PROC so that TCP timer values are now expressed to
the user in ms, while they are stored internally as ticks. Note
that there probably are rounding bogons here, especially on the
alpha.
1999-08-31 03:40:24 +00:00
Jonathan Lemon
9b8b58e033 Restructure TCP timeout handling:
- eliminate the fast/slow timeout lists for TCP and instead use a
    callout entry for each timer.
  - increase the TCP timer granularity to HZ
  - implement "bad retransmit" recovery, as presented in
    "On Estimating End-to-End Network Path Properties", by Allman and Paxson.

Submitted by:	jlemon, wollmann
1999-08-30 21:17:07 +00:00
Bill Fumerola
a5a388c7ab Add $FreeBSD$ and spell Eklund properly.
Approved by:	brian (well, he approved adding $Id$)
1999-08-29 23:17:04 +00:00
David E. O'Brien
5a8c77a83c Remove extra indenting of `break' statements introducted in rev 1.89,
plus wrap some long lines from that revision.

While here, wrap some other long lines.
1999-08-29 21:59:03 +00:00
Dag-Erling Smørgrav
27108a1511 Include the correct header for the IPSTEALTH option. 1999-08-29 12:18:39 +00:00
Bruce Evans
684f9417a2 Oops, I missed a cast in rev.1.119. 1999-08-29 10:23:13 +00:00
Larry Lile
fcf11853dc It is much easier to arp if you don't truncate your arp-reply's.
[affects token-ring only]
1999-08-28 14:57:12 +00:00
Brian Feldman
78f9020e95 Also make the "other" packets counter resettable. 1999-08-28 07:20:59 +00:00
Peter Wemm
c3aac50f28 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
Peter Wemm
7f3dea244c $Id$ -> $FreeBSD$ 1999-08-28 00:22:10 +00:00
Brian Feldman
4d1bb12d6c Correction: uid -> gid (comment) 1999-08-27 23:46:02 +00:00
Jonathan Lemon
6da3d6578b Add readonly OID ``net.inet.tcp.tcbhashsize'' so it is possible to
discover the size of the TCB hashtable on a running system.
1999-08-26 19:52:17 +00:00
Bruce Evans
ff0061bb1d Cast pointers to [u]intptr_t instead of casting them to [u_]long. Don't
depend on gcc's feature of casting lvalues, especially for direct
assignment where it doesn't even simplify the syntax.  Cosmetic.
1999-08-24 00:48:19 +00:00
Brian Somers
7765ab6476 Aallow ppp to work with Nortel Networks Extranet Switch
product and Windows NT tunneling.

Submitted by: Chain Lee <chain@nortelnetworks.com>
1999-08-22 23:32:01 +00:00
Tim Vanderhoek
a395af9036 Typo: 102 => 192 (PR: docs/13310 - Maxim Sobolev <sobomax@altavista.net>) 1999-08-22 19:23:33 +00:00
Brian Feldman
32e7924603 To christen the brand new security category for syslog, we get IPFW
using syslog(3) (log(9)) for its various purposes! This long-awaited
change also includes such nice things as:
	* macros expanding into _two_ comma-delimited arguments!
	* snprintf!
	* more snprintf!
	* linting and criticism by more people than you can shake a stick at!
	* a slightly more uniform message style than before!
	 and last but not least
	* no less than 5 rewrites!

Reviewed by:	committers
1999-08-21 18:35:55 +00:00
Geoff Rehmet
828b7f4069 Fix breakage if blackhole=1 and tiflags & TH_SYN, plus
style(9) fixes

Submitted by:	 Jonathon Lemon
1999-08-19 05:22:12 +00:00
Geoff Rehmet
2e4e1b4c31 Slight tweak to tcp.blackhole to add optional behaviour to
drop any segment arriving at a closed port.
tcp.blackhole=1 - only drop SYN without RST
tcp.blackhole=2 - drop everything without RST
tcp.blackhole=0 - always send RST - default behaviour

This confuses nmap -sF or -sX or -sN quite badly.
1999-08-18 15:40:05 +00:00
Bill Fumerola
ed8bcdec67 Fix a printf() formatter to match its variable.
Reviewed by:	bde, luigi
1999-08-17 22:10:00 +00:00
Geoff Rehmet
16f7f31f04 Add net.inet.tcp.blackhole and net.inet.udp.blackhole
sysctl knobs.

With these knobs on, refused connection attempts are dropped
without sending a RST, or Port unreachable in the UDP case.
In the TCP case, sending of RST is inhibited iff the incoming
segment was a SYN.

Docs and rc.conf settings to follow.
1999-08-17 12:17:53 +00:00
Mike Pritchard
74804d58a0 Various man page cleanup:
- Sort xrefs
- FreeBSD.ORG -> FreeBSD.org
- Be consistent with section names as outlines in mdoc(7)
- Other misc mdoc cleanup.

PR:		doc/13144
Submitted by:	Alexy M. Zelkin <phantom@cris.net>
1999-08-15 09:51:25 +00:00
Luigi Rizzo
772759420f Implement probabilistic rule match in ipfw. Each rule can be associated
with a match probability to achieve non-deterministic behaviour of
the firewall. This can be extremely useful for testing purposes
such as simulating random packet drop without having to use dummynet
(which already does the same thing), and simulating multipath effects
and the associated out-of-order delivery (this time in conjunction
with dummynet).

The overhead on normal rules is just one comparison with 0.

Since it would have been trivial to implement this by just adding
a field to the ip_fw structure, I decided to do it in a
backward-compatible way (i.e. struct ip_fw is unchanged, and as a
consequence you don't need to recompile ipfw if you don't want to
use this feature), since this was also useful for -STABLE.

When, at some point, someone decides to change struct ip_fw, please
add a length field and a version number at the beginning, so userland
apps can keep working even if they are out of sync with the kernel.
1999-08-11 15:34:47 +00:00
Luigi Rizzo
706aa7f870 Add spl() protection to remove that the timer is invoked multiple
times resulting in higher bandwidth and lower delays.
Reported-by: Jamshid Madhavi
1999-08-11 14:37:58 +00:00
Dag-Erling Smørgrav
18d3153ead Add net.inet.icmp.log_redirect and net.inet.icmp.drop_redirect, for
respectively logging and dropping ICMP REDIRECT packets.

Note that there is no rate limiting on the log messages, so log_redirect
should be used with caution (preferrably only for debugging purposes).
1999-08-10 09:45:33 +00:00
Brian Feldman
0b6c1a832d Make ipfw's logging more dynamic. Now, log will use the default limit
_or_ you may specify "log logamount number" to set logging specifically
the rule.
   In addition, "ipfw resetlog" has been added, which will reset the
logging counters on any/all rule(s). ipfw resetlog does not affect
the packet/byte counters (as ipfw reset does), and is the only "set"
command that can be run at securelevel >= 3.
   This should address complaints about not being able to set logging
amounts, not being able to restart logging at a high securelevel,
and not being able to just reset logging without resetting all of the
counters in a rule.
1999-08-01 16:57:24 +00:00
Brian Feldman
7558f6aad9 8 -> NBBy 1999-07-28 22:27:27 +00:00
Brian Feldman
f8075bf9b3 Correct a really gross comment format. 1999-07-28 22:22:57 +00:00
Jonathan M. Bresler
e9bd3a37e8 fix comment re: RST received in TIME_WAIT to match the code. 1999-07-18 14:42:48 +00:00
Brian Feldman
24ad8fe519 Correct a mistake in so_cred changes. In practice, I don't think that it
would make a difference. However, my previous diff _did_ change the
behavior in some way (not necessarily break it), so I'm fixing it.

Found by:	bde
Submitted by:	bde
1999-07-12 18:58:23 +00:00
Brian Feldman
490d50b60a Two new sysctls: net.inet.tcp.getcred and net.inet.udp.getcred. These take
a sockaddr_in[2] (local, then remote) and return a struct ucred. Example
code for these is at:
	http://www.FreeBSD.org/~green/inetd_ident.patch
	http://www.FreeBSD.org/~green/freebsd4.c (for pidentd)

Reviewed by:	bde
1999-07-11 18:32:46 +00:00
Mike Smith
35ec852af5 Use the new tunable macros for the net.inet.tcp.tcbhashsize tunable. 1999-07-05 08:46:55 +00:00
Pierre Beyssac
5a903f8d73 In in_pcbconnect(), check the return value from in_pcbbind() and
exit on errors.

If we don't, in_pcbrehash() is called without a preceeding
in_pcbinshash(), causing a crash.

There are apparently several conditions that could cause the crash;
PR misc/12256 is only one of these.

PR:		misc/12256
1999-06-25 23:46:47 +00:00
Brian Somers
0622eafc89 Don't get caught in an infinite recursion when PKT_ALIAS_REVERSE
is set.
Document PKT_ALIAS_REVERSE.

Pointed out by:	Jonathan Hanna <jh@cr1003333-a.crdva1.bc.home.com>
PR:		12304
1999-06-22 11:20:03 +00:00
Brian Feldman
7a2aab80b0 This is the much-awaited cleaned up version of IPFW [ug]id support.
All relevant changes have been made (including ipfw.8).
1999-06-19 18:43:33 +00:00
Brian Feldman
ebe119ae6c Add RCS strings to kernel ipfilter files. 1999-06-19 11:35:41 +00:00
Brian Feldman
e214816bfc This should fix ipfilter for everyone it was broken for. CDEV_MAJOR is _not_
-1.

Noticed by: users on freebsd-current
1999-06-19 02:54:04 +00:00
Brian Feldman
f29be02190 Reviewed by: the cast of thousands
This is the change to struct sockets that gets rid of so_uid and replaces
it with a much more useful struct pcred *so_cred. This is here to be able
to do socket-level credential checks (i.e. IPFW uid/gid support, to be added
to HEAD soon). Along with this comes an update to pidentd which greatly
simplifies the code necessary to get a uid from a socket. Soon to come:
a sysctl() interface to finding individual sockets' credentials.
1999-06-17 23:54:50 +00:00
Tor Egge
23fc6cddce Close a race window where a tcp socket is closed while tcp_pcblist is
copying out tcp socket info, causing a NULL pointer to be dereferenced.
1999-06-16 19:05:17 +00:00
Ruslan Ermilov
adbdafc6b2 Don't accept divert/tee/pipe rules without corresponding option.
PR:		10324
Reviewed by:	luigi
1999-06-11 11:27:35 +00:00
Peter Wemm
9c9906e912 Plug a mbuf leak in tcp_usr_send(). pru_send() routines are expected
to either enqueue or free their mbuf chains, but tcp_usr_send() was
dropping them on the floor if the tcpcb/inpcb has been torn down in the
middle of a send/write attempt.  This has been responsible for a wide
variety of mbuf leak patterns, ranging from slow gradual leakage to rather
rapid exhaustion.  This has been a problem since before 2.2 was branched
and appears to have been fixed in rev 1.16 and lost in 1.23/1.28.

Thanks to Jayanth Vijayaraghavan <jayanth@yahoo-inc.com> for checking
(extensively) into this on a live production 2.2.x system and that it
was the actual cause of the leak and looks like it fixes it.  The machine
in question was loosing (from memory) about 150 mbufs per hour under
load and a change similar to this stopped it.  (Don't blame Jayanth
for this patch though)

An alternative approach to this would be to recheck SS_CANTSENDMORE etc
inside the splnet() right before calling pru_send() after all the potential
sleeps, interrupts and delays have happened.  However, this would mean
exposing knowledge of the tcp stack's reset handling and removal of the
pcb to the generic code.  There are other things that call pru_send()
directly though.

Problem originally noted by:  John Plevyak <jplevyak@inktomi.com>
1999-06-04 02:27:06 +00:00
Poul-Henning Kamp
2447bec829 Simplify cdevsw registration.
The cdevsw_add() function now finds the major number(s) in the
struct cdevsw passed to it.  cdevsw_add_generic() is no longer
needed, cdevsw_add() does the same thing.

cdevsw_add() will print an message if the d_maj field looks bogus.

Remove nblkdev and nchrdev variables.  Most places they were used
bogusly.  Instead check a dev_t for validity by seeing if devsw()
or bdevsw() returns NULL.

Move bdevsw() and devsw() functions to kern/kern_conf.c

Bump __FreeBSD_version to 400006

This commit removes:
        72 bogus makedev() calls
        26 bogus SYSINIT functions

if_xe.c bogusly accessed cdevsw[], author/maintainer please fix.

I4b and vinum not changed.  Patches emailed to authors.  LINT
probably broken until they catch up.
1999-05-31 11:29:30 +00:00
Poul-Henning Kamp
4e2f199e0c This commit should be a extensive NO-OP:
Reformat and initialize correctly all "struct cdevsw".

        Initialize the d_maj and d_bmaj fields.

        The d_reset field was not removed, although it is never used.

I used a program to do most of this, so all the files now use the
same consistent format.  Please keep it that way.

Vinum and i4b not modified, patches emailed to respective authors.
1999-05-30 16:53:49 +00:00
David Greenman
9a039a5fec Added net.inet.tcp.path_mtu_discovery variable which when set to 0
(default 1) disables PMTUD globally. Although PMTUD can be disabled in
the standard case by locking the MTU on a static route (including the
default route), this method doesn't work in the face of dynamic routing
protocols like gated.
1999-05-27 12:24:21 +00:00
David Greenman
afed137543 Made net.inet.ip.intr_queue_maxlen writeable. 1999-05-27 12:20:33 +00:00
Luigi Rizzo
e142fadecb close pr 10889:
+ add a missing call to dn_rule_delete() when flushing firewall
  rules, thus preventing possible panics due to dangling pointers
  (this was already done for single rule deletes).
+ improve "usage" output in ipfw(8)
+ add a few checks to ipfw pipe parameters and make it a bit more
  tolerant of common mistakes (such as specifying kbit instead of Kbit)

PR: kern/10889
Submitted by: Ruslan Ermilov
1999-05-24 10:01:22 +00:00
Brian Somers
6961f3da13 brucify
Mentioned by: sprice@hiwaay.net
1999-05-23 13:52:05 +00:00
Eivind Eklund
5164f52665 Make incoming packets work as keepalives, too. This should fix problems
for some games.

Notified of problem by:	tim@turbinegames.com
1999-05-20 20:20:24 +00:00
Peter Wemm
e2bd2a1bb7 "fix" warning. This still needs to be kld-ified some day (or removed). 1999-05-11 16:07:16 +00:00
Peter Wemm
ecbc643aca Pre-declare struct proc to avoid 'inside param list' warnings. 1999-05-08 14:28:52 +00:00
Peter Wemm
bb6f0e396b Fix two warnings; and note a problem where a pointer is stored in an
int variable - this can't work on an Alpha.
1999-05-06 22:08:57 +00:00
Peter Wemm
dfd5dee1b0 Add sufficient braces to keep egcs happy about potentially ambiguous
if/else nesting.
1999-05-06 18:13:11 +00:00
Luigi Rizzo
eaa726bed6 Free the dummynet descriptor in ip_dummynet, not in the called
routines. The descriptor contains parameters which could be used
within those routines (eg. ip_output() ).

On passing, add IPPROTO_PGM entry to netinet/in.h
1999-05-04 16:20:33 +00:00
Brian Somers
f1dfc9571e Add missing ``.''. 1999-05-04 10:56:13 +00:00
Luigi Rizzo
a7c219496c forgot passing the right pointer to dst to dummynet_io().
(-stable and releng2 were already safe).
Debugged-By: phk
1999-05-04 09:26:12 +00:00
Luigi Rizzo
44f1bb1a55 assorted dummynet cleanup:
+ plug an mbuf leak when dummynet used with bridging
 + make prototype of dummynet_io consistent with usage
 + code cleanup so that now bandwidth regulation is precise to the
   bit/s and not to (8*HZ) bit/s as before.
1999-05-04 07:30:08 +00:00
Bill Fumerola
3d177f465a Add sysctl descriptions to many SYSCTL_XXXs
PR:		kern/11197
Submitted by:	Adrian Chadd <adrian@FreeBSD.org>
Reviewed by:	billf(spelling/style/minor nits)
Looked at by:	bde(style)
1999-05-03 23:57:32 +00:00
Poul-Henning Kamp
75c1354190 This Implements the mumbled about "Jail" feature.
This is a seriously beefed up chroot kind of thing.  The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.

For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact:  "real virtual servers".

Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.

Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.

It generally does what one would expect, but setting up a jail
still takes a little knowledge.

A few notes:

   I have no scripts for setting up a jail, don't ask me for them.

   The IP number should be an alias on one of the interfaces.

   mount a /proc in each jail, it will make ps more useable.

   /proc/<pid>/status tells the hostname of the prison for
   jailed processes.

   Quotas are only sensible if you have a mountpoint per prison.

   There are no privisions for stopping resource-hogging.

   Some "#ifdef INET" and similar may be missing (send patches!)

If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!

Tools, comments, patches & documentation most welcome.

Have fun...

Sponsored by:   http://www.rndassociates.com/
Run for almost a year by:       http://www.servetheweb.com/
1999-04-28 11:38:52 +00:00
Dmitrij Tejblum
604359cf9b s/static foo_devsw_installed = 0;/static int foo_devsw_installed;/.
(Edited automatically)
1999-04-28 10:54:24 +00:00
Poul-Henning Kamp
f711d546d2 Suser() simplification:
1:
  s/suser/suser_xxx/

2:
  Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
  s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.
1999-04-27 11:18:52 +00:00
Luigi Rizzo
1e83f7627b Make one pass through the firewall the default.
Multiple pass (which only affects dummynet) is too confusing.
1999-04-26 14:57:24 +00:00
Andrey A. Chernov
3879597f4f so_linger is in seconds, not in 1/HZ
PR: 11252
Submitted by: Martin Kammerhofer <dada@sbox.tu-graz.ac.at>
1999-04-24 18:25:35 +00:00
Dmitrij Tejblum
9d8da4723b Use pointer arithmetic as appropriate. 1999-04-24 13:23:48 +00:00
Luigi Rizzo
88a5354ece postpone the sending of IGMP LEAVE msg to after deleting the
mc address from the address list. The latter operation on some
hardware resets the card, potentially canceling the pending LEAVE
pkt.
1999-04-24 12:28:51 +00:00
Luoqi Chen
46d28b4462 Work around an egcs optimizer bug (i386). This should fix the active ftp
hang problem. A bug report has been sent to cygnus.
1999-04-21 21:28:01 +00:00
Peter Wemm
3a8e3ea548 s/IPFIREWALL_MODULE/KLD_MODULE/ 1999-04-20 14:29:59 +00:00
Peter Wemm
66e55756b5 Tidy up some stray / unused stuff in the IPFW package and friends.
- unifdef -DCOMPAT_IPFW  (this was on by default already)
- remove traces of in-kernel ip_nat package, it was never committed.
- Make IPFW and DUMMYNET initialize themselves rather than depend on
  compiled-in hooks in ip_init().  This means they initialize the same
  way both in-kernel and as kld modules.  (IPFW initializes now :-)
1999-04-20 13:32:06 +00:00
Peter Wemm
d95939af7a Zap LKM option and support. Farewell old friend. 1999-04-19 14:19:52 +00:00
Peter Wemm
a9013e8a77 Convert the dummynet lkm code to be kld aware (this isn't actually used
anywhere that I can see).
1999-04-17 11:09:08 +00:00
Peter Wemm
ffab0b4be1 Oops, forgot this part of lkm code that's been replaced with kld. 1999-04-17 08:56:38 +00:00
Eivind Eklund
243c4c6d7f Better handling for ARP/source routing on Token Ring
Submitted by:	Larry Lile <lile@stdio.com>
1999-04-15 17:58:24 +00:00
Eivind Eklund
66235db57f Staticize. 1999-04-11 02:50:42 +00:00
Julian Elischer
29089b519d Two cosmetic changes, one a typo and the other, a clarification. 1999-04-07 22:22:06 +00:00
Nick Sayer
a44e3914d5 Merge from RELENG_2_2, per luigi. Fixes the ntoh?() issue for the
firewall code when called from the bridge code.

PR:             10818
Submitted by:   nsayer
Obtained from:  luigi
1999-03-30 23:45:34 +00:00
Luigi Rizzo
6d41201fc5 Use the correct length from the mbuf header instead of the one from
the IP header (this would not work for bridged packets).
This has been fixed long ago in the 2.2 branch.

Problem noticed by: a few people
Fix suggested by: Remy Nonnenmacher
1999-03-26 14:15:59 +00:00
Brian Somers
42889ed1d5 PacketAliasProxyRule takes a const char *
Reminded by: bde
1999-03-25 06:48:05 +00:00
Brian Somers
942759e756 Add a ``const'' and remove some inconsistent prototype args. 1999-03-24 20:28:58 +00:00
Luigi Rizzo
efce68a2e9 add missing #include "opt_bdg.h" 1999-03-24 12:43:39 +00:00
Bill Fumerola
26bb956563 Remove duplicate line.
Reviewed by:	eivind
1999-03-23 23:01:15 +00:00
Luigi Rizzo
f0a53591ad Fix a dummynet bug caused by passing a bad next hop address (the
symptom was the msg "arp failure -- host is not on local network" that
some user have seen on multihomed machines.
Bug tracked down by Emmanuel Duros
1999-03-16 12:06:11 +00:00
Julian Elischer
ed1ff184f3 Fix the 'fwd' option to ipfw when asked to divert to another machine.
also rely less on other modules clearing static values, and clear them
in a few cases we missed before.
Submitted by: Matthew Reimer <mreimer@vpop.net>
1999-03-12 01:15:57 +00:00
Julian Elischer
fda82fc2b9 Submitted by: Larry Lile
Move the Olicom token ring driver to the officially sanctionned location of
/sys/contrib. Also fix some brokenness in the generic token ring support.

Be warned that if_dl.h has been changed and SOME programs might
like recompilation.
1999-03-10 10:11:43 +00:00
Brian Somers
4c32f5d217 Remove all diagnostics to stdout/stderr with #ifdef DEBUG
Statify functions in alias_nbt.c
1999-03-09 23:44:00 +00:00
Brian Somers
164928d385 Document PacketAliasPptp() and allow it to be disabled
by passing INADDR_NONE.
1999-03-07 18:13:23 +00:00
Brian Somers
b862eda467 Remove unused function stubs. 1999-03-07 15:36:58 +00:00
Brian Somers
ac8e3334de Mention that PacketAliasProxyRule() doesn't accept host names,
just IP numbers.
1999-03-07 15:02:22 +00:00
Archie Cobbs
94446a2ed6 When an incoming packet is reflected back as an ICMP reply, make sure we
zero "m->m_pkthdr.rcvif", otherwise ipfw may wrongly match the outgoing packet.
PR:		kern/9723
Submitted by:	David Malone <dwmalone@maths.tcd.ie>
1999-03-06 23:10:42 +00:00
Brian Somers
619d1a30a1 Document PacketAliasProxyRule() and fix a typo. 1999-03-06 21:58:43 +00:00
Garrett Wollman
4022a21d8b Move kernel-only declaration inside #ifdef KERNEL section. 1999-03-06 04:51:41 +00:00
Bill Paul
51e1b50529 arprequest() allocates an mbuf with m_gethdr() but does not initialize
m->m_pkthdr.rcvif to NULL. Bad arprequest(). No biscuit.
1999-03-04 04:03:57 +00:00
Brian Somers
7d96f4efd2 Version 3.0: January 1, 1999
- Transparent proxying support added.
    - PPTP redirecting support added based on patches
      contributed by Dru Nelson <dnelson@redwoodsoft.com>.

Submitted by: Charles Mott <cmott@srv.net>
1999-02-27 02:16:01 +00:00
Dag-Erling Smørgrav
1b968362aa Add support for stealth forwarding (forwarding packets without touching
their ttl). This can be used - in combination with the proper ipfw
incantations - to make a firewall or router invisible to traceroute
and other exploration tools.

This behaviour is controlled by a sysctl variable (net.inet.ip.stealth)
and hidden behind a kernel option (IPSTEALTH).

Reviewed by:	eivind, bde
1999-02-22 18:19:57 +00:00
Julian Elischer
722012cc0c World, I'd like you to meet the first FreeBSD token Ring driver.
This  is for various Olicom cards. An IBM driver is following.
This patch also adds support to tcpdump to decode packets on tokenring.
Congratulations to the proud father.. (below)

Submitted by:	Larry Lile <lile@stdio.com>
1999-02-20 11:18:00 +00:00
Luigi Rizzo
17458d3570 avoid panic with pkts larger than MTU and DF set coming out of a pipe. 1999-02-19 18:32:55 +00:00
Doug Rabson
ce02431ffa * Change sysctl from using linker_set to construct its tree using SLISTs.
This makes it possible to change the sysctl tree at runtime.

* Change KLD to find and register any sysctl nodes contained in the loaded
  file and to unregister them when the file is unloaded.

Reviewed by: Archie Cobbs <archie@whistle.com>,
	Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)
1999-02-16 10:49:55 +00:00
Garrett Wollman
cc766e041e After wading in the cesspool of ip_input for an hour, I have managed to
convince myself that nothing will break if we permit IP input while
interface addresses are unconfigured.  (At worst, they will hit some
ULP's PCB scan and fail if nobody is listening.)  So, remove the restriction
that addresses must be configured before packets can be input.  Assume
that any unicast packet we receive while unconfigured is potentially ours.
1999-02-09 16:55:46 +00:00
Julian Elischer
a0c091ad1d remove leftover garbage line. 1999-02-08 05:53:39 +00:00
Julian Elischer
b0935ca284 Fix for PR 9309.
Divert was not feeding clean data to ifa_ifwithaddr() so it was
giving bad results.
Submitted by: kseel <kseel@utcorp.com>, Ruslan Ermilov <ru@ucb.crimea.ua>
1999-02-08 05:48:46 +00:00
Bill Fenner
51b7b33769 Use snd_nxt, not rcv_nxt, when calculating the ISS during TIME_WAIT.
This was missed in the 4.4-Lite2 merge.

Noticed by:	Mohan Parthasarathy <Mohan.Parthasarathy@eng.Sun.COM> and
		jayanth@loc201.tandem.com (vijayaraghavan_jayanth)
		on the tcp-impl mailing list.
1999-02-06 00:47:45 +00:00
Mike Smith
ea95d6917a Nuke all the stupid ffs() stuff and use powerof2() instead.
Submitted by:	Bruce Evans <bde@zeta.org.au>
1999-02-04 03:27:43 +00:00
Mike Smith
609b6256fd Fix power-of-2 check for the TCB hash size.
Submitted by:	Brian Feldman <green@unixhelp.org>
1999-02-04 03:02:56 +00:00
Mike Smith
aa1458495d Make TCBHASHSIZE a boot-time tunable as well, taking its value from the
variable net.inet.tcp.tcbhashsize.

Requested by:	David Filo <filo@yahoo-inc.com>
1999-02-03 08:59:30 +00:00
Matthew Dillon
831a80b0d5 Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile
1999-01-27 22:42:27 +00:00
Archie Cobbs
85abbaa800 Move kernel-only declarations to within #ifdef KERNEL
Prompted by:	gcc warnings when compiling /sbin/ipfw
1999-01-23 23:59:50 +00:00
Garrett Wollman
92af003dd8 Don't forward unicast packets received via link-layer multicast.
Suggested by: fenner
Original complaint: Shiva Shenoy <Shiva.Shenoy@yagosys.com>
1999-01-22 16:50:45 +00:00
Bill Fenner
b0acefa8d4 Add a flag, passed to pru_send routines, PRUS_MORETOCOME. This
flag means that there is more data to be put into the socket buffer.
Use it in TCP to reduce the interaction between mbuf sizes and the
Nagle algorithm.

Based on:	"Justin C. Walker" <justin@apple.com>'s description of Apple's
		fix for this problem.
1999-01-20 17:32:01 +00:00
Bill Fenner
cc133fa302 Fix bug in last commit (la was used uninitialized if no route was passed in). 1999-01-19 23:17:03 +00:00
Bill Fenner
01b7c0826f Use dynamic memory allocation instead of mbuf's for multicast routing
state.

Note: this requires a recompilation of netstat (but netstat has been
broken since rev 1.52 of ip_mroute.c anyway)

Obtained from:	Significantly based on Steve McCanne's
		<mccanne@cs.berkeley.edu> work for BSD/OS
1999-01-18 02:06:59 +00:00
Bill Fenner
9bb02c7b92 Rename igmp's MALLOC; it doesn't have anything to do with multicast routing. 1999-01-18 01:56:31 +00:00
Bill Fenner
e7bc5f272b If arpresolve() gets passed a route with a null llinfo, call
arplookup() to try again.  This gets rid of at least one user's
 "arpresolve: can't allocate llinfo" errors, and arplookup() gives
 better error messages to help track down the problem if there really
 is a problem with the routing table.
1999-01-18 01:54:36 +00:00
Eivind Eklund
0189b9a1fd ... _and_ the (void*) casts for %p. Next, I'll forget my own name :-( 1999-01-12 16:43:52 +00:00
Eivind Eklund
e23c7d8612 Avoid unnecessary GCCism - I hadn't noticed the __unused macro. 1999-01-12 16:40:57 +00:00
Eivind Eklund
19e5ea73d0 * Print pointers using the correct type (%p) instead of %x.
* Use the correct type for timeout function.
* Add missing #include.
1999-01-12 12:27:54 +00:00
Eivind Eklund
dee383e043 Add #ifdef's to avoid unused label warning in some cases. 1999-01-12 12:25:00 +00:00
Eivind Eklund
ab54eac7a1 Remove unused statics. 1999-01-12 12:16:50 +00:00
Luigi Rizzo
4180488519 Add a missing bzero which could be the source of instability
problems reported recently (the rtentry pointer in the dummynet
queue was not initialized in all cases, resulting in spurious
rt_refcnt decreases in the lucky cases, and memory trashing in
other cases.
1999-01-11 11:08:07 +00:00
Luigi Rizzo
a1a2de5139 Remove check from where arp replies are coming from -- when doing bridging,
interfaces are used in clusters so the check does not apply.
1999-01-10 17:40:10 +00:00
Brian Somers
a2743da670 If we can't open alias.log, don't try to write to the
resulting NULL FILE *.
PR:	9403
1999-01-10 02:05:13 +00:00
Luigi Rizzo
cc277540ab Partial fix for when ipfw is used with bridging. Bridged packets
have all fields in network order, whereas ipfw expects some to be
in host order. This resulted in some incorrect matching, e.g. some
packets being identified as fragments, or bandwidth not being
correctly enforced.
NOTE: this only affects bridge+ipfw, normal ipfw usage was already
correct).

Reported-By: Dave Alden and others.
1998-12-31 07:43:29 +00:00
Luigi Rizzo
ea1f41f65f Remove some unused variables. 1998-12-31 07:35:49 +00:00
Luigi Rizzo
549db363a7 'ip_fw_head' and 'M_IPFW' are also used in ip_dummynet so cannot be
static...
Reported by: Dave Alden
1998-12-22 20:38:06 +00:00
Luigi Rizzo
af38c68c1e Recover from previous dummynet screwup 1998-12-21 22:40:54 +00:00
Luigi Rizzo
f0f6d6434d Restore 1.82->1.83 change deleted by mistake< per Bruce suggestion 1998-12-21 21:36:40 +00:00
Bill Fenner
0b6205feca Add missing "break"s to allow multicast routing to work.
Submitted by:	Amancio Hasty <hasty@rah.star-gate.com>
1998-12-16 18:07:11 +00:00
Luigi Rizzo
b715f178c6 Last bits (i think) of dummynet for -current. 1998-12-14 18:09:13 +00:00
Matthew Dillon
374fad8b17 Reviewed by: freebsd-current
Add bounds checking to netbios NS packet resolving code.  This should
    prevent natd from crashing on badly formed netbios packets (as might be
    heard when the machine is sitting on a cable modem or certain DSL
    networks), and also closes potential security holes that might have
    exploited the lack of bounds checking in the previous version of the
    code.
1998-12-14 02:25:32 +00:00
Matthew Dillon
4462d1a56f PR: kern/8990
If timer calculation results in degenerate value (0), force it to 1
    to avoid divide-by-zero panic later on in calls to IGMP_RANDOM_DELAY().
    I considered simply adding 1 to the timer calculation, but was unsure
    if the calculation was part of the IGMP standard or not so did not want
    to mess with it for all cases.
1998-12-12 21:45:49 +00:00
Archie Cobbs
f1d19042b0 The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static
and local variables, goto labels, and functions declared but not defined.
1998-12-07 21:58:50 +00:00
Eivind Eklund
6572231d20 Clean up some pointer usage. 1998-12-07 05:41:10 +00:00
Archie Cobbs
2127f26023 Examine all occurrences of sprintf(), strcat(), and str[n]cpy()
for possible buffer overflow problems. Replaced most sprintf()'s
with snprintf(); for others cases, added terminating NUL bytes where
appropriate, replaced constants like "16" with sizeof(), etc.

These changes include several bug fixes, but most changes are for
maintainability's sake. Any instance where it wasn't "immediately
obvious" that a buffer overflow could not occur was made safer.

Reviewed by:	Bruce Evans <bde@zeta.org.au>
Reviewed by:	Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by:	Mike Spengler <mks@networkcs.com>
1998-12-04 22:54:57 +00:00
Matthew Dillon
5fce7fc47f Cleanup icmp_var.h, make icmp bandlim sysctl permanent but if ICMP_BANDLIM
option not defined the sysctl int value is set to -1 and read-only.

    #ifdef KERNEL's added appropriately to wall off visibility of kernel
    routines from user code.
1998-12-04 04:21:25 +00:00
Matthew Dillon
a3e7459d60 Obtained from: "Andrey A. Chernov" <ache@nagual.pp.ru>
Quick add #ifdef KERNEL for ICMP_BANDLIM option so userland program
     can #include icmp_var.h
1998-12-04 03:49:18 +00:00
Matthew Dillon
51508de112 Reviewed by: freebsd-current
Add ICMP_BANDLIM option and 'net.inet.icmp.icmplim' sysctl.  If option
    is specified in kernel config, icmplim defaults to 100 pps.  Setting it
    to 0 will disable the feature.  This feature limits ICMP error responses
    for packets sent to bad tcp or udp ports, which does a lot to help the
    machine handle network D.O.S. attacks.

    The kernel will report packet rates that exceed the limit at a rate of
    one kernel printf per second.  There is one issue in regards to the
    'tail end' of an attack... the kernel will not output the last report
    until some unrelated and valid icmp error packet is return at some
    point after the attack is over.  This is a minor reporting issue only.
1998-12-03 20:23:21 +00:00
Eivind Eklund
29be051d68 Staticize some more. 1998-11-26 18:54:52 +00:00
John Polstra
b2052ac8cf Fix a couple of typos. 1998-11-19 18:07:28 +00:00
Doug Rabson
73c8631124 Remove stale references to ih_next and ih_prev.
Pointed out by: Roman V. Palagin <romanp@wuppy.rcs.ru>
1998-11-17 10:53:37 +00:00
Doug Rabson
7a94149e37 Make the previous fix more portable.
Requested by: bde
1998-11-16 08:27:36 +00:00
Guido van Rooij
d285db5598 The below patch helps to reduce the leakage of internal socket information
when a TCP "stealth" scan is directed at a *BSD box by ensuring the window
is 0 for all RST packets generated through tcp_respond()
Reviewed by:	Don Lewis <Don.Lewis@tsc.tdk.com>
Obtained from:	Bugtraq (from: Darren Reed <avalon@COOMBS.ANU.EDU.AU>)
1998-11-15 21:35:09 +00:00
Doug Rabson
48a39a495a Fix printf format errors on alpha. 1998-11-15 18:10:14 +00:00
Bruce Evans
c25ded316f Finished updating module event handlers to be compatible with
modeventhand_t.
1998-11-15 15:33:52 +00:00
David Greenman
9ec944bdb0 Be sure to pullup entire IP header when dealing with fragment packets. 1998-11-11 21:17:59 +00:00
Peter Wemm
1c5bb3eaa1 add #include <sys/kernel.h> where it's needed by MALLOC_DEFINE() 1998-11-10 09:16:29 +00:00
Doug Rabson
e3f0338ead Some optimisations to the fragment reassembly code.
Submitted by: Don Lewis <Don.Lewis@tsc.tdk.com>
1998-10-27 09:19:03 +00:00
Doug Rabson
afb3fdba7c Fix a bug in the new fragment reassembly code which was tickled by recieving
a fragment which wholly overlapped one or more existing fragments.

Submitted by: Don Lewis <Don.Lewis@tsc.tdk.com>
1998-10-27 09:11:41 +00:00
Peter Wemm
aa855a598d *gulp*. Jordan specifically OK'ed this..
This is the bulk of the support for doing kld modules.  Two linker_sets
were replaced by SYSINIT()'s.  VFS's and exec handlers are self registered.
kld is now a superset of lkm.  I have converted most of them, they will
follow as a seperate commit as samples.
This all still works as a static a.out kernel using LKM's.
1998-10-16 03:55:01 +00:00
Doug Rabson
e0dd9c3ed8 Dike out some obsolete defines which referenced ih_next and ih_prev from
struct ipovly (they don't exist anymore because they don't work when
pointers are 64bit).
1998-09-26 14:26:59 +00:00