Commit graph

481 commits

Author SHA1 Message Date
Guido van Rooij
d23d475fb4 Consider the following situation:
1. A packet comes in that is to be forwarded
2. The destination of the packet is rewritten by some firewall code
3. The next link's MTU is too small
4. The packet has the DF bit set

Then the current code is such that instead of setting the next
link's MTU in the ICMP error, ip_next_mtu() is called and a guess
is sent as to which MTU is supposed to be tried next. This is because
in this case ip_forward() is called with srcrt set to 1. In that
case the ia pointer remains NULL but it is needed to get the MTU
of the interface the packet is to be sent out from.
Thus, we always set ia to the outgoing interface.

MFC after:	2 weeks
2007-12-02 13:00:47 +00:00
Robert Watson
30d239bc4c Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

  mac_<object>_<method/action>
  mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme.  Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier.  Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods.  Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by:	SPARTA (original patches against Mac OS X)
Obtained from:	TrustedBSD Project, Apple Computer
2007-10-24 19:04:04 +00:00
Mike Silbersack
4b421e2daa Add FBSDID to all files in netinet so that people can more
easily include file version information in bug reports.

Approved by:	re (kensmith)
2007-10-07 20:44:24 +00:00
Bjoern A. Zeeb
cc977adc71 Rename option IPSEC_FILTERGIF to IPSEC_FILTERTUNNEL.
Also rename the related functions in a similar way.
There are no functional changes.

For a packet coming in with IPsec tunnel mode, the default is
to only call into the firewall with the "outer" IP header and
payload.

With this option turned on, in addition to the "outer" parts,
the "inner" IP header and payload are passed to the
firewall too when going through ip_input() the second time.

The option was never only related to a gif(4) tunnel within
an IPsec tunnel and thus the name was very misleading.

Discussed at:			BSDCan 2007
Best new name suggested by:	rwatson
Reviewed by:			rwatson
Approved by:			re (bmah)
2007-08-05 16:16:15 +00:00
George V. Neville-Neil
b2630c2934 Commit the change from FAST_IPSEC to IPSEC. The FAST_IPSEC
option is now deprecated, as well as the KAME IPsec code.
What was FAST_IPSEC is now IPSEC.

Approved by: re
Sponsored by: Secure Computing
2007-07-03 12:13:45 +00:00
George V. Neville-Neil
2cb64cb272 Commit IPv6 support for FAST_IPSEC to the tree.
This commit includes only the kernel files, the rest of the files
will follow in a second commit.

Reviewed by:    bz
Approved by:    re
Supported by:   Secure Computing
2007-07-01 11:41:27 +00:00
Robert Watson
6751f8364e Remove leading spaces before tabs spotted thanks to silby using
kwrite to read ip_input.c.
2007-05-16 20:46:58 +00:00
Robert Watson
f2565d68a4 Move universally to ANSI C function declarations, with relatively
consistent style(9)-ish layout.
2007-05-10 15:58:48 +00:00
Robert Watson
30916a2d1d Replace a comment about RSVP/mrouting with a different but similar comment
explaining that some more locking is needed.  The routing pieces are done,
but there is an interlocking issue between optionally compiled code and
mandatory code.

Spotted by:	kris
2007-03-25 21:49:50 +00:00
Andre Oppermann
6489fe6553 Match up SYSCTL declaration style. 2007-03-19 19:00:51 +00:00
Bruce M Simpson
f8429ca2e1 In regular forwarding path, reject packets destined for 169.254.0.0/16
link-local addresses. See RFC 3927 section 2.7.
2007-02-03 06:45:51 +00:00
Robert Watson
aed5570872 Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h.  sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA
2006-10-22 11:52:19 +00:00
Julian Elischer
010b65f54a revert last change.. premature.. need to wait until if_ethersubr.c
uses pfil to get to ipfw.
2006-10-21 00:16:31 +00:00
Julian Elischer
3df668cc38 Move some variables to a more likely place
and remove "temporary" stuff that is not needed any more.
2006-10-20 19:32:08 +00:00
Julian Elischer
b7522c27d2 Remove the IPFIREWALL_FORWARD_EXTENDED option and make it on by default as it always was
in older versions of FreeBSD. This option is pointless as it is needed in just
about every interesting usage of forward that I have ever seen. It doesn't make
the system any safer and just wastes huge amounts of develper time
when the system doesn't behave as expected when code is moved from
4.x to 6.x It doesn't make
the system any safer and just wastes huge amounts of develper time
when the system doesn't behave as expected when code is moved from
4.x to 6.x  or 7.x
Reviewed by:	glebius
MFC after:	1 week
2006-08-17 00:37:03 +00:00
Max Laier
e93187482d Reintroduce net.inet6.ip6.fw.enable sysctl to dis/enable the ipv6 processing
seperately.  Also use pfil hook/unhook instead of keeping the check
functions in pfil just to return there based on the sysctl.  While here fix
some whitespace on a nearby SYSCTL_ macro.
2006-05-12 04:41:27 +00:00
Pawel Jakub Dawidek
1d7d0bfe5e /tmp/cvsTXPIwQ 2006-05-05 06:24:34 +00:00
Paul Saab
4f590175b7 Allow for nmbclusters and maxsockets to be increased via sysctl.
An eventhandler is used to update all the various zones that depend
on these values.
2006-04-21 09:25:40 +00:00
Oleg Bulyzhin
6edb555dbc Fix five years old bug in ip_reass(): if we are using 'full' (i.e. including
pseudo header) hardware rx checksum offloading ip_reass() fails to calculate
TCP/UDP checksum for reassembled packet correctly.  This also should fix
recent 'NFS over UDP over bge' issue exposed by if_bge.c rev. 1.123

Reviewed by:	sam (earlier version), bde
Approved by:	glebius (mentor)
MFC after:	2 weeks
2006-02-07 11:48:10 +00:00
Christian S.J. Peron
604afec496 Somewhat re-factor the read/write locking mechanism associated with the packet
filtering mechanisms to use the new rwlock(9) locking API:

- Drop the variables stored in the phil_head structure which were specific to
  conditions and the home rolled read/write locking mechanism.
- Drop some includes which were used for condition variables
- Drop the inline functions, and convert them to macros. Also, move these
  macros into pfil.h
- Move pfil list locking macros intp phil.h as well
- Rename ph_busy_count to ph_nhooks. This variable will represent the number
  of IN/OUT hooks registered with the pfil head structure
- Define PFIL_HOOKED macro which evaluates to true if there are any
  hooks to be ran by pfil_run_hooks
- In the IP/IP6 stacks, change the ph_busy_count comparison to use the new
  PFIL_HOOKED macro.
- Drop optimization in pfil_run_hooks which checks to see if there are any
  hooks to be ran, and returns if not. This check is already performed by the
  IP stacks when they call:

        if (!PFIL_HOOKED(ph))
                goto skip_hooks;

- Drop in assertion which makes sure that the number of hooks never drops
  below 0 for good measure. This in theory should never happen, and if it
  does than there are problems somewhere
- Drop special logic around PFIL_WAITOK because rw_wlock(9) does not sleep
- Drop variables which support home rolled read/write locking mechanism from
  the IPFW firewall chain structure.
- Swap out the read/write firewall chain lock internal to use the rwlock(9)
  API instead of our home rolled version
- Convert the inlined functions to macros

Reviewed by:	mlaier, andre, glebius
Thanks to:	jhb for the new locking API
2006-02-02 03:13:16 +00:00
Andre Oppermann
1dfcf0d2a3 Move the IPSEC related code blocks to their own file to unclutter
and signifincantly improve the readability of ip_input() and
ip_output() again.

The resulting IPSEC hooks in ip_input() and ip_output() may be
used later on for making IPSEC loadable.

This move is mostly mechanical and should preserve current IPSEC
behaviour as-is.  Nothing shall prevent improvements in the way
IPSEC interacts with the IPv4 stack.

Discussed with:	bz, gnn, rwatson; (earlier version)
2006-02-01 13:55:03 +00:00
Andre Oppermann
ab48768b20 When doing IP forwarding with [FAST_]IPSEC compiled into the kernel
ip_forward() would report back a zero MTU in ICMP needfrag messages
because on a IPSEC SP lookup failure no MTU got computed.

Fix this by changing the logic to compute a new MTU in any case if
IPSEC didn't do it.

Change MTU computation logic to use egress interface MTU if available
or the next smaller MTU compared to the current packet size instead
of falling back to a very small fixed MTU.

Fix associated comment.

PR:		kern/91412
MFC after:	3 days
2006-01-24 17:57:19 +00:00
Robert Watson
d248c7d7f5 Modify the IP fragment reassembly code so that it uses a new UMA zone,
ipq_zone, to allocate fragment headers from, rather than using cast mbuf
storage.  This was one of the few remaining uses of mbuf storage for
local data structures that relied on dtom().  Implement the resource
limit on ipq's using UMA zone limits, but preserve current sysctl
semantics using a sysctl proc.

MFC after:	3 weeks
2006-01-15 18:58:21 +00:00
Robert Watson
dfa60d9354 Staticize ipqlock, since it is local to ip_input.c.
MFC after:	3 days
2006-01-15 17:05:48 +00:00
Ruslan Ermilov
f4e9888107 Fix -Wundef. 2005-12-04 02:12:43 +00:00
Andre Oppermann
22f2c8b5db Remove 'ipprintfs' which were protected under DIAGNOSTIC. It doesn't
have any know to enable it from userland and could only be enabled by
either setting it to 1 at compile time or through the kernel debugger.

In the future it may be brought back as KTR tracing points.

Discussed with:	rwatson
Sponsored by:	TCP/IP Optimization Fundraise 2005
2005-11-19 17:04:52 +00:00
Andre Oppermann
ef39adf007 Consolidate all IP Options handling functions into ip_options.[ch] and
include ip_options.h into all files making use of IP Options functions.

From ip_input.c rev 1.306:
  ip_dooptions(struct mbuf *m, int pass)
  save_rte(m, option, dst)
  ip_srcroute(m0)
  ip_stripoptions(m, mopt)

From ip_output.c rev 1.249:
  ip_insertoptions(m, opt, phlen)
  ip_optcopy(ip, jp)
  ip_pcbopts(struct inpcb *inp, int optname, struct mbuf *m)

No functional changes in this commit.

Discussed with:	rwatson
Sponsored by:	TCP/IP Optimization Fundraise 2005
2005-11-18 20:12:40 +00:00
Andre Oppermann
780b2f698c In ip_forward() copy as much into the temporary error mbuf as we
have free space in it.  Allocate correct mbuf from the beginning.
This allows icmp_error() to quote the entire TCP header in error
messages.

Sponsored by:	TCP/IP Optimization Fundraise 2005
2005-11-18 14:44:48 +00:00
Ruslan Ermilov
4a0d6638b3 - Store pointer to the link-level address right in "struct ifnet"
rather than in ifindex_table[]; all (except one) accesses are
  through ifp anyway.  IF_LLADDR() works faster, and all (except
  one) ifaddr_byindex() users were converted to use ifp->if_addr.

- Stop storing a (pointer to) Ethernet address in "struct arpcom",
  and drop the IFP2ENADDR() macro; all users have been converted
  to use IF_LLADDR() instead.
2005-11-11 16:04:59 +00:00
Andre Oppermann
e0aec68255 Use the correct mbuf type for MGET(). 2005-08-30 16:35:27 +00:00
Robert Watson
dd5a318ba3 Introduce in_multi_mtx, which will protect IPv4-layer multicast address
lists, as well as accessor macros.  For now, this is a recursive mutex
due code sequences where IPv4 multicast calls into IGMP calls into
ip_output(), which then tests for a multicast forwarding case.

For support macros in in_var.h to check multicast address lists, assert
that in_multi_mtx is held.

Acquire in_multi_mtx around iteration over the IPv4 multicast address
lists, such as in ip_input() and ip_output().

Acquire in_multi_mtx when manipulating the IPv4 layer multicast addresses,
as well as over the manipulation of ifnet multicast address lists in order
to keep the two layers in sync.

Lock down accesses to IPv4 multicast addresses in IGMP, or assert the
lock when performing IGMP join/leave events.

Eliminate spl's associated with IPv4 multicast addresses, portions of
IGMP that weren't previously expunged by IGMP locking.

Add in_multi_mtx, igmp_mtx, and if_addr_mtx lock order to hard-coded
lock order in WITNESS, in that order.

Problem reported by:	Ed Maste <emaste at phaedrus dot sandvine dot ca>
MFC after:		10 days
2005-08-03 19:29:47 +00:00
Robert Watson
b77634d046 Remove spl() calls from ip_slowtimo(), as IP fragment queue locking was
merged several years ago.

Submitted by:	gnn
MFC after:	1 day
2005-07-19 12:14:22 +00:00
Andre Oppermann
c773494edd Pass icmp_error() the MTU argument directly instead of
an interface pointer.  This simplifies a couple of uses
and removes some XXX workarounds.
2005-05-04 13:09:19 +00:00
Maxim Konovalov
800af1fb81 o Nano optimize ip_reass() code path for the first fragment: do not
try to reasseble the packet from the fragments queue with the only
fragment, finish with the first fragment as soon as we create a queue.

Spotted by:	Vijay Singh

o Drop the fragment if maxfragsperpacket == 0, no chances we
will be able to reassemble the packet in future.

Reviewed by:	silby
2005-04-08 10:25:13 +00:00
Sam Leffler
6a9909b5e6 plug resource leak
Noticed by:	Coverity Prevent analysis tool
2005-03-16 05:27:19 +00:00
Sam Leffler
db77984c5b fix potential invalid index into ip_protox array
Noticed by:	Coverity Prevent analysis tool
2005-02-23 00:38:12 +00:00
Andre Oppermann
099dd0430b Bring back the full packet destination manipulation for 'ipfw fwd'
with the kernel compile time option:

 options IPFIREWALL_FORWARD_EXTENDED

This option has to be specified in addition to IPFIRWALL_FORWARD.

With this option even packets targeted for an IP address local
to the host can be redirected.  All restrictions to ensure proper
behaviour for locally generated packets are turned off.  Firewall
rules have to be carefully crafted to make sure that things like
PMTU discovery do not break.

Document the two kernel options.

PR:		kern/71910
PR:		kern/73129
MFC after:	1 week
2005-02-22 17:40:40 +00:00
Gleb Smirnoff
a97719482d Add CARP (Common Address Redundancy Protocol), which allows multiple
hosts to share an IP address, providing high availability and load
balancing.

Original work on CARP done by Michael Shalayeff, with many
additions by Marco Pfatschbacher and Ryan McBride.

FreeBSD port done solely by Max Laier.

Patch by:	mlaier
Obtained from:	OpenBSD (mickey, mcbride)
2005-02-22 13:04:05 +00:00
Robert Watson
024105493d Prefer (NULL) spelling of (0) for pointers.
MFC after:	3 days
2005-01-30 19:29:47 +00:00
Warner Losh
c398230b64 /* -> /*- for license, minor formatting changes 2005-01-07 01:45:51 +00:00
Mike Silbersack
5f311da2cc Port randomization leads to extremely fast port reuse at high
connection rates, which is causing problems for some users.

To retain the security advantage of random ports and ensure
correct operation for high connection rate users, disable
port randomization during periods of high connection rates.

Whenever the connection rate exceeds randomcps (10 by default),
randomization will be disabled for randomtime (45 by default)
seconds.  These thresholds may be tuned via sysctl.

Many thanks to Igor Sysoev, who proved the necessity of this
change and tested many preliminary versions of the patch.

MFC After:	20 seconds
2005-01-02 01:50:57 +00:00
Andre Oppermann
de38924dc0 Support for dynamically loadable and unloadable IP protocols in the ipmux.
With pr_proto_register() it has become possible to dynamically load protocols
within the PF_INET domain.  However the PF_INET domain has a second important
structure called ip_protox[] that is derived from the 'struct protosw inetsw[]'
and takes care of the de-multiplexing of the various protocols that ride on
top of IP packets.

The functions ipproto_[un]register() allow to dynamically adjust the ip_protox[]
array mux in a consistent and easy way.  To register a protocol within
ip_protox[] the existence of a corresponding and matching protocol definition
in inetsw[] is required.  The function does not allow to overwrite an already
registered protocol.  The unregister function simply replaces the mux slot with
the default index pointer to IPPROTO_RAW as it was previously.
2004-10-19 15:45:57 +00:00
Max Laier
d6a8d58875 Add an additional struct inpcb * argument to pfil(9) in order to enable
passing along socket information. This is required to work around a LOR with
the socket code which results in an easy reproducible hard lockup with
debug.mpsafenet=1. This commit does *not* fix the LOR, but enables us to do
so later. The missing piece is to turn the filter locking into a leaf lock
and will follow in a seperate (later) commit.

This will hopefully be MT5'ed in order to fix the problem for RELENG_5 in
forseeable future.

Suggested by:		rwatson
A lot of work by:	csjp (he'd be even more helpful w/o mentor-reviews ;)
Reviewed by:		rwatson, csjp
Tested by:		-pf, -ipfw, LINT, csjp and myself
MFC after:		3 days

LOR IDs:		14 - 17 (not fixed yet)
2004-09-29 04:54:33 +00:00
Maxim Konovalov
4bc37f9836 o Turn net.inet.ip.check_interface sysctl off by default.
When net.inet.ip.check_interface was MFCed to RELENG_4 3+ years ago in
rev. 1.130.2.17 ip_input.c it was 1 by default but shortly changed to
0 (accidently?) in rev. 1.130.2.20 in RELENG_4 only.  Among with the
fact this knob is not documented it breaks POLA especially in bridge
environment.

OK'ed by:	andre
Reviewed by:	-current
2004-09-24 12:18:40 +00:00
Andre Oppermann
db09bef308 Fix an out of bounds write during the initialization of the PF_INET protocol
family to the ip_protox[] array.  The protocol number of IPPROTO_DIVERT is
larger than IPPROTO_MAX and was initializing memory beyond the array.
Catch all these kinds of errors by ignoring protocols that are higher than
IPPROTO_MAX or 0 (zero).

Add more comments ip_init().
2004-09-16 18:33:39 +00:00
Andre Oppermann
76ff6dcf46 Clarify some comments for the M_FASTFWD_OURS case in ip_input(). 2004-09-15 20:17:03 +00:00
Andre Oppermann
e098266191 Remove the last two global variables that are used to store packet state while
it travels through the IP stack.  This wasn't much of a problem because IP
source routing is disabled by default but when enabled together with SMP and
preemption it would have very likely cross-corrupted the IP options in transit.

The IP source route options of a packet are now stored in a mtag instead of the
global variable.
2004-09-15 20:13:26 +00:00
Andre Oppermann
c21fd23260 Always compile PFIL_HOOKS into the kernel and remove the associated kernel
compile option.  All FreeBSD packet filters now use the PFIL_HOOKS API and
thus it becomes a standard part of the network stack.

If no hooks are connected the entire packet filter hooks section and related
activities are jumped over.  This removes any performance impact if no hooks
are active.

Both OpenBSD and DragonFlyBSD have integrated PFIL_HOOKS permanently as well.
2004-08-27 15:16:24 +00:00
Andre Oppermann
e4c97eff8e Bring back the sysctl 'net.inet.ip.fw.enable' to unbreak the startup scripts
and to be able to disable ipfw if it was compiled directly into the kernel.
2004-08-19 17:38:47 +00:00
Robert Watson
0f48e25b63 Fix build of ip_input.c with "options IPSEC" -- the "pass:" label
is used with both FAST_IPSEC and IPSEC, but was defined for only
FAST_IPSEC.
2004-08-18 03:11:04 +00:00
Andre Oppermann
9b932e9e04 Convert ipfw to use PFIL_HOOKS. This is change is transparent to userland
and preserves the ipfw ABI.  The ipfw core packet inspection and filtering
functions have not been changed, only how ipfw is invoked is different.

However there are many changes how ipfw is and its add-on's are handled:

 In general ipfw is now called through the PFIL_HOOKS and most associated
 magic, that was in ip_input() or ip_output() previously, is now done in
 ipfw_check_[in|out]() in the ipfw PFIL handler.

 IPDIVERT is entirely handled within the ipfw PFIL handlers.  A packet to
 be diverted is checked if it is fragmented, if yes, ip_reass() gets in for
 reassembly.  If not, or all fragments arrived and the packet is complete,
 divert_packet is called directly.  For 'tee' no reassembly attempt is made
 and a copy of the packet is sent to the divert socket unmodified.  The
 original packet continues its way through ip_input/output().

 ipfw 'forward' is done via m_tag's.  The ipfw PFIL handlers tag the packet
 with the new destination sockaddr_in.  A check if the new destination is a
 local IP address is made and the m_flags are set appropriately.  ip_input()
 and ip_output() have some more work to do here.  For ip_input() the m_flags
 are checked and a packet for us is directly sent to the 'ours' section for
 further processing.  Destination changes on the input path are only tagged
 and the 'srcrt' flag to ip_forward() is set to disable destination checks
 and ICMP replies at this stage.  The tag is going to be handled on output.
 ip_output() again checks for m_flags and the 'ours' tag.  If found, the
 packet will be dropped back to the IP netisr where it is going to be picked
 up by ip_input() again and the directly sent to the 'ours' section.  When
 only the destination changes, the route's 'dst' is overwritten with the
 new destination from the forward m_tag.  Then it jumps back at the route
 lookup again and skips the firewall check because it has been marked with
 M_SKIP_FIREWALL.  ipfw 'forward' has to be compiled into the kernel with
 'option IPFIREWALL_FORWARD' to enable it.

 DUMMYNET is entirely handled within the ipfw PFIL handlers.  A packet for
 a dummynet pipe or queue is directly sent to dummynet_io().  Dummynet will
 then inject it back into ip_input/ip_output() after it has served its time.
 Dummynet packets are tagged and will continue from the next rule when they
 hit the ipfw PFIL handlers again after re-injection.

 BRIDGING and IPFW_ETHER are not changed yet and use ipfw_chk() directly as
 they did before.  Later this will be changed to dedicated ETHER PFIL_HOOKS.

More detailed changes to the code:

 conf/files
	Add netinet/ip_fw_pfil.c.

 conf/options
	Add IPFIREWALL_FORWARD option.

 modules/ipfw/Makefile
	Add ip_fw_pfil.c.

 net/bridge.c
	Disable PFIL_HOOKS if ipfw for bridging is active.  Bridging ipfw
	is still directly invoked to handle layer2 headers and packets would
	get a double ipfw when run through PFIL_HOOKS as well.

 netinet/ip_divert.c
	Removed divert_clone() function.  It is no longer used.

 netinet/ip_dummynet.[ch]
	Neither the route 'ro' nor the destination 'dst' need to be stored
	while in dummynet transit.  Structure members and associated macros
	are removed.

 netinet/ip_fastfwd.c
	Removed all direct ipfw handling code and replace it with the new
	'ipfw forward' handling code.

 netinet/ip_fw.h
	Removed 'ro' and 'dst' from struct ip_fw_args.

 netinet/ip_fw2.c
	(Re)moved some global variables and the module handling.

 netinet/ip_fw_pfil.c
	New file containing the ipfw PFIL handlers and module initialization.

 netinet/ip_input.c
	Removed all direct ipfw handling code and replace it with the new
	'ipfw forward' handling code.  ip_forward() does not longer require
	the 'next_hop' struct sockaddr_in argument.  Disable early checks
	if 'srcrt' is set.

 netinet/ip_output.c
	Removed all direct ipfw handling code and replace it with the new
	'ipfw forward' handling code.

 netinet/ip_var.h
	Add ip_reass() as general function.  (Used from ipfw PFIL handlers
	for IPDIVERT.)

 netinet/raw_ip.c
	Directly check if ipfw and dummynet control pointers are active.

 netinet/tcp_input.c
	Rework the 'ipfw forward' to local code to work with the new way of
	forward tags.

 netinet/tcp_sack.c
	Remove include 'opt_ipfw.h' which is not needed here.

 sys/mbuf.h
	Remove m_claim_next() macro which was exclusively for ipfw 'forward'
	and is no longer needed.

Approved by:	re (scottl)
2004-08-17 22:05:54 +00:00
David Malone
1f44b0a1b5 Get rid of the RANDOM_IP_ID option and make it a sysctl. NetBSD
have already done this, so I have styled the patch on their work:

        1) introduce a ip_newid() static inline function that checks
        the sysctl and then decides if it should return a sequential
        or random IP ID.

        2) named the sysctl net.inet.ip.random_id

        3) IPv6 flow IDs and fragment IDs are now always random.
        Flow IDs and frag IDs are significantly less common in the
        IPv6 world (ie. rarely generated per-packet), so there should
        be smaller performance concerns.

The sysctl defaults to 0 (sequential IP IDs).

Reviewed by:	andre, silby, mlaier, ume
Based on:	NetBSD
MFC after:	2 months
2004-08-14 15:32:40 +00:00
Andre Oppermann
9d804f818c Fix two cases of incorrect IPQ_UNLOCK'ing in the merged ip_reass() function.
The first one was going to 'dropfrag', which unlocks the IPQ, before the lock
was aquired; The second one doing a unlock and then a 'goto dropfrag' which
led to a double-unlock.

Tripped over by:	des
2004-08-12 08:37:42 +00:00
Andre Oppermann
0b17fba7bc Consistently use NULL for pointer comparisons. 2004-08-11 10:46:15 +00:00
Andre Oppermann
bb7c5b3055 Make a comment that IP source routing is not SMP and PREEMPTION safe. 2004-08-09 16:17:37 +00:00
Andre Oppermann
f0cada84b1 o Move all parts of the IP reassembly process into the function ip_reass() to
make it fully self-contained.
o ip_reass() now returns a new mbuf with the reassembled packet and ip->ip_len
  including the IP header.
o Computation of the delayed checksum is moved into divert_packet().

Reviewed by:	silby
2004-08-03 12:31:38 +00:00
Brian Somers
0ac4013324 Change the following environment variables to kernel options:
bootp -> BOOTP
    bootp.nfsroot -> BOOTP_NFSROOT
    bootp.nfsv3 -> BOOTP_NFSV3
    bootp.compat -> BOOTP_COMPAT
    bootp.wired_to -> BOOTP_WIRED_TO

- i.e. back out the previous commit.  It's already possible to
pxeboot(8) with a GENERIC kernel.

Pointed out by: dwmalone
2004-07-08 22:35:36 +00:00
Brian Somers
59e1ebc9b5 Change the following kernel options to environment variables:
BOOTP -> bootp
    BOOTP_NFSROOT -> bootp.nfsroot
    BOOTP_NFSV3 -> bootp.nfsv3
    BOOTP_COMPAT -> bootp.compat
    BOOTP_WIRED_TO -> bootp.wired_to

This lets you PXE boot with a GENERIC kernel by putting this sort of thing
in loader.conf:

    bootp="YES"
    bootp.nfsroot="YES"
    bootp.nfsv3="YES"
    bootp.wired_to="bge1"

or even setting the variables manually from the OK prompt.
2004-07-08 13:40:33 +00:00
Bruce M Simpson
4f450ff9a5 Check that m->m_pkthdr.rcvif is not NULL before checking if a packet
was received on a broadcast address on the input path. Under certain
circumstances this could result in a panic, notably for locally-generated
packets which do not have m_pkthdr.rcvif set.

This is a similar situation to that which is solved by
src/sys/netinet/ip_icmp.c rev 1.66.

PR:		kern/52935
2004-06-18 12:58:45 +00:00
Bruce M Simpson
57ab3660ff In ip_forward(), when calculating the MTU in effect for an IPSEC transport
mode tunnel, take the per-route MTU into account, *if* and *only if* it
is non-zero (as found in struct rt_metrics/rt_metrics_lite).

PR:		kern/42727
Obtained from:	NetBSD (ip_input.c rev 1.151)
2004-06-16 08:33:09 +00:00
Bruce M Simpson
e6b0a57025 In ip_forward(), set m->m_pkthdr.len correctly such that the mbuf chain
is sane, and ipsec4_getpolicybyaddr() will therefore complete.

PR:		kern/42727
Obtained from:	KAME (kame/freebsd4/sys/netinet/ip_input.c rev 1.42)
2004-06-16 08:28:54 +00:00
Max Laier
02b199f158 Link ALTQ to the build and break with ABI for struct ifnet. Please recompile
your (network) modules as well as any userland that might make sense of
sizeof(struct ifnet).
This does not change the queueing yet. These changes will follow in a
seperate commit. Same with the driver changes, which need case by case
evaluation.

__FreeBSD_version bump will follow.

Tested-by:	(i386)LINT
2004-06-13 17:29:10 +00:00
Andre Oppermann
2bde81acd6 Provide the sysctl net.inet.ip.process_options to control the processing
of IP options.

 net.inet.ip.process_options=0  Ignore IP options and pass packets unmodified.
 net.inet.ip.process_options=1  Process all IP options (default).
 net.inet.ip.process_options=2  Reject all packets with IP options with ICMP
  filter prohibited message.

This sysctl affects packets destined for the local host as well as those
only transiting through the host (routing).

IP options do not have any legitimate purpose anymore and are only used
to circumvent firewalls or to exploit certain behaviours or bugs in TCP/IP
stacks.

Reviewed by:	sam (mentor)
2004-05-06 18:46:03 +00:00
Darren Reed
2f3f1e6773 Rename m_claim_next_hop() to m_claim_next(), as suggested by Max Laier. 2004-05-02 15:10:17 +00:00
Darren Reed
ab884d993e Rename ip_claim_next_hop() to m_claim_next_hop(), give it an extra arg
(the type of tag to claim) and push it out of ip_var.h into mbuf.h alongside
all of the other macros that work ok mbuf's and tag's.
2004-05-02 06:36:30 +00:00
Warner Losh
f36cfd49ad Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson
2004-04-07 20:46:16 +00:00
Robert Watson
7101d752b2 Invert the logic of NET_LOCK_GIANT(), and remove the one reference to it.
Previously, Giant would be grabbed at entry to the IP local delivery code
when debug.mpsafenet was set to true, as that implied Giant wouldn't be
grabbed in the driver path.  Now, we will use this primitive to
conditionally grab Giant in the event the entire network stack isn't
running MPSAFE (debug.mpsafenet == 0).
2004-03-28 23:12:19 +00:00
Robert Watson
6200a93f82 Rename NET_PICKUP_GIANT() to NET_LOCK_GIANT(), and NET_DROP_GIANT()
to NET_UNLOCK_GIANT().  While they are used in similar ways, the
semantics are quite different -- NET_LOCK_GIANT() and NET_UNLOCK_GIANT()
directly wrap mutex lock and unlock operations, whereas drop/pickup
special case the handling of Giant recursion.  Add a comment saying
as much.

Add NET_ASSERT_GIANT(), which conditionally asserts Giant based
on the value of debug_mpsafenet.
2004-03-01 22:37:01 +00:00
Robert Watson
768bbd68cc Remove unneeded {} originally used to hold local variables for dummynet
in a code block, as the variable is now gone.

Submitted by:	sam
2004-02-28 19:50:43 +00:00
Max Laier
ac9d7e2618 Re-remove MT_TAGs. The problems with dummynet have been fixed now.
Tested by: -current, bms(mentor), me
Approved by: bms(mentor), sam
2004-02-25 19:55:29 +00:00
Max Laier
36e8826ffb Backout MT_TAG removal (i.e. bring back MT_TAGs) for now, as dummynet is
not working properly with the patch in place.

Approved by: bms(mentor)
2004-02-18 00:04:52 +00:00
Max Laier
189a0ba4e7 Do not check receive interface when pfil(9) hook changed address.
Approved by: bms(mentor)
2004-02-13 19:20:43 +00:00
Max Laier
1094bdca51 This set of changes eliminates the use of MT_TAG "pseudo mbufs", replacing
them mostly with packet tags (one case is handled by using an mbuf flag
since the linkage between "caller" and "callee" is direct and there's no
need to incur the overhead of a packet tag).

This is (mostly) work from: sam

Silence from: -arch
Approved by: bms(mentor), sam, rwatson
2004-02-13 19:14:16 +00:00
Poul-Henning Kamp
be8a62e821 Introduce the SO_BINTIME option which takes a high-resolution timestamp
at packet arrival.

For benchmarking purposes SO_BINTIME is preferable to SO_TIMEVAL
since it has higher resolution and lower overhead.  Simultaneous
use of the two options is possible and they will return consistent
timestamps.

This introduces an extra test and a function call for SO_TIMEVAL, but I have
not been able to measure that.
2004-01-31 10:40:25 +00:00
Andre Oppermann
0cfbbe3bde Make sure all uses of stack allocated struct route's are properly
zeroed.  Doing a bzero on the entire struct route is not more
expensive than assigning NULL to ro.ro_rt and bzero of ro.ro_dst.

Reviewed by:	sam (mentor)
Approved by:	re  (scottl)
2003-11-26 20:31:13 +00:00
Andre Oppermann
97d8d152c2 Introduce tcp_hostcache and remove the tcp specific metrics from
the routing table.  Move all usage and references in the tcp stack
from the routing table metrics to the tcp hostcache.

It caches measured parameters of past tcp sessions to provide better
initial start values for following connections from or to the same
source or destination.  Depending on the network parameters to/from
the remote host this can lead to significant speedups for new tcp
connections after the first one because they inherit and shortcut
the learning curve.

tcp_hostcache is designed for multiple concurrent access in SMP
environments with high contention and is hash indexed by remote
ip address.

It removes significant locking requirements from the tcp stack with
regard to the routing table.

Reviewed by:	sam (mentor), bms
Reviewed by:	-net, -current, core@kame.net (IPv6 parts)
Approved by:	re (scottl)
2003-11-20 20:07:39 +00:00
Andre Oppermann
26d02ca7ba Remove RTF_PRCLONING from routing table and adjust users of it
accordingly.  The define is left intact for ABI compatibility
with userland.

This is a pre-step for the introduction of tcp_hostcache.  The
network stack remains fully useable with this change.

Reviewed by:	sam (mentor), bms
Reviewed by:	-net, -current, core@kame.net (IPv6 parts)
Approved by:	re (scottl)
2003-11-20 19:47:31 +00:00
Brian Feldman
633461295a Fix a few cases where MT_TAG-type "fake mbufs" are created on the stack, but
do not have mh_nextpkt initialized.  Somtimes what's there is "1", and the
ip_input() code pukes trying to m_free() it, rendering divert sockets and
such broken.
This really underscores the need to get rid of MT_TAG.

Reviewed by:	rwatson
2003-11-17 03:17:49 +00:00
Andre Oppermann
c76ff7084f Make ipstealth global as we need it in ip_fastforward too. 2003-11-15 01:45:56 +00:00
Andre Oppermann
02c1c7070e Remove the global one-level rtcache variable and associated
complex locking and rework ip_rtaddr() to do its own rtlookup.
Adopt all its callers to this and make ip_output() callable
with NULL rt pointer.

Reviewed by:	sam (mentor)
2003-11-14 21:48:57 +00:00
Andre Oppermann
9188b4a169 Introduce ip_fastforward and remove ip_flow.
Short description of ip_fastforward:

 o adds full direct process-to-completion IPv4 forwarding code
 o handles ip fragmentation incl. hw support (ip_flow did not)
 o sends icmp needfrag to source if DF is set (ip_flow did not)
 o supports ipfw and ipfilter (ip_flow did not)
 o supports divert, ipfw fwd and ipfilter nat (ip_flow did not)
 o returns anything it can't handle back to normal ip_input

Enable with sysctl -w net.inet.ip.fastforwarding=1

Reviewed by:	sam (mentor)
2003-11-14 21:02:22 +00:00
Sam Leffler
7138d65c3f replace explicit changes to rt_refcnt by RT_ADDREF and RT_REMREF
macros that expand to include assertions when the system is built
with INVARIANTS

Supported by:	FreeBSD Foundation
2003-11-08 23:36:32 +00:00
Sam Leffler
7902224c6b o add a flags parameter to netisr_register that is used to specify
whether or not the isr needs to hold Giant when running; Giant-less
  operation is also controlled by the setting of debug_mpsafenet
o mark all netisr's except NETISR_IP as needing Giant
o add a GIANT_REQUIRED assertion to the top of netisr's that need Giant
o pickup Giant (when debug_mpsafenet is 1) inside ip_input before
  calling up with a packet
o change netisr handling so swi_net runs w/o Giant; instead we grab
  Giant before invoking handlers based on whether the handler needs Giant
o change netisr handling so that netisr's that are marked MPSAFE may
  have multiple instances active at a time
o add netisr statistics for packets dropped because the isr is inactive

Supported by:	FreeBSD Foundation
2003-11-08 22:28:40 +00:00
Sam Leffler
ad67584665 Fix locking of the ip forwarding cache. We were holding a reference
to a routing table entry w/o bumping the reference count or locking
against the entry being free'd.  This caused major havoc (for some
reason it appeared most frequently for folks running natd).  Fix
is to bump the reference count whenever we copy the route cache
contents into a private copy so the entry cannot be reclaimed out
from under us.  This is a short term fix as the forthcoming routing
table changes will eliminate this cache entirely.

Supported by:	FreeBSD Foundation
2003-11-07 01:47:52 +00:00
Hajimu UMEMOTO
0f9ade718d - cleanup SP refcnt issue.
- share policy-on-socket for listening socket.
- don't copy policy-on-socket at all.  secpolicy no longer contain
  spidx, which saves a lot of memory.
- deep-copy pcb policy if it is an ipsec policy.  assign ID field to
  all SPD entries.  make it possible for racoon to grab SPD entry on
  pcb.
- fixed the order of searching SA table for packets.
- fixed to get a security association header.  a mode is always needed
  to compare them.
- fixed that the incorrect time was set to
  sadb_comb_{hard|soft}_usetime.
- disallow port spec for tunnel mode policy (as we don't reassemble).
- an user can define a policy-id.
- clear enc/auth key before freeing.
- fixed that the kernel crashed when key_spdacquire() was called
  because key_spdacquire() had been implemented imcopletely.
- preparation for 64bit sequence number.
- maintain ordered list of SA, based on SA id.
- cleanup secasvar management; refcnt is key.c responsibility;
  alloc/free is keydb.c responsibility.
- cleanup, avoid double-loop.
- use hash for spi-based lookup.
- mark persistent SP "persistent".
  XXX in theory refcnt should do the right thing, however, we have
  "spdflush" which would touch all SPs.  another solution would be to
  de-register persistent SPs from sptree.
- u_short -> u_int16_t
- reduce kernel stack usage by auto variable secasindex.
- clarify function name confusion.  ipsec_*_policy ->
  ipsec_*_pcbpolicy.
- avoid variable name confusion.
  (struct inpcbpolicy *)pcb_sp, spp (struct secpolicy **), sp (struct
  secpolicy *)
- count number of ipsec encapsulations on ipsec4_output, so that we
  can tell ip_output() how to handle the packet further.
- When the value of the ul_proto is ICMP or ICMPV6, the port field in
  "src" of the spidx specifies ICMP type, and the port field in "dst"
  of the spidx specifies ICMP code.
- avoid from applying IPsec transport mode to the packets when the
  kernel forwards the packets.

Tested by:	nork
Obtained from:	KAME
2003-11-04 16:02:05 +00:00
Robert Watson
eecfe773aa Remove comment about desire for eventual explicit labeling of ICMP
header copy made on input path: this is now handled differently.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-11-03 18:01:38 +00:00
Hajimu UMEMOTO
59dfcba4aa add ECN support in layer-3.
- implement the tunnel egress rule in ip_ecn_egress() in ip_ecn.c.
   make ip{,6}_ecn_egress() return integer to tell the caller that
   this packet should be dropped.
 - handle ECN at fragment reassembly in ip_input.c and frag6.c.

Obtained from:	KAME
2003-10-29 15:07:04 +00:00
Sam Leffler
f51f805f7e pfil hooks can modify packet contents so check if the destination
address has been changed when PFIL_HOOKS is enabled and, if it has,
arrange for the proper action by ip*_forward.

Supported by:	FreeBSD Foundation
Submitted by:	Pyun YongHyeon
2003-10-16 16:25:25 +00:00
Sam Leffler
b35a1e5d66 purge extraneous ';'s
Supported by:	FreeBSD Foundation
Noticed by:	bde
2003-10-15 18:19:28 +00:00
Sam Leffler
929b31ddab Lock ip forwarding route cache. While we're at it, remove the global
variable ipforward_rt by introducing an ip_forward_cacheinval() call
to use to invalidate the cache.

Supported by:	FreeBSD Foundation
2003-10-14 19:19:12 +00:00
Sam Leffler
888c2a3c4e remove dangling ';'s` that were harmless
Supported by:	FreeBSD Foundation
2003-10-14 18:45:50 +00:00
Sam Leffler
134ea22494 o update PFIL_HOOKS support to current API used by netbsd
o revamp IPv4+IPv6+bridge usage to match API changes
o remove pfil_head instances from protosw entries (no longer used)
o add locking
o bump FreeBSD version for 3rd party modules

Heavy lifting by:	"Max Laier" <max@love2party.net>
Supported by:		FreeBSD Foundation
Obtained from:		NetBSD (bits of pfil.h and pfil.c)
2003-09-23 17:54:04 +00:00
Sam Leffler
2fad1e931e lock ip fragment queues
Submitted by:	Robert Watson <rwatson@freebsd.org>
Obtained from:	BSD/OS
2003-09-05 00:10:33 +00:00
Sam Leffler
1f76a5e218 add IPSEC_FILTERGIF suport for FAST_IPSEC
PR:		kern/51922
Submitted by:	Eric Masson <e-masson@kisoft-services.com>
MFC after:	1 week
2003-07-22 18:58:34 +00:00
Mike Silbersack
fcaf9f9146 Map icmp time exceeded responses to EHOSTUNREACH rather than 0 (no error);
this makes connect act more sensibly in these cases.

PR:				50839
Submitted by:			Barney Wolff <barney@pit.databus.com>
Patch delayed by laziness of:	silby
MFC after:			1 week
2003-06-17 06:21:08 +00:00
Robert Watson
042bbfa3b5 When setting fragment queue pointers to NULL, or comparing them with
NULL, use NULL rather than 0 to improve readability.
2003-06-06 19:32:48 +00:00
Robert Watson
688fe1d954 Trim a call to mac_create_mbuf_from_mbuf() since m_tag meta-data
copying for mbuf headers now works properly in m_dup_pkthdr(), so
we don't need to do an explicit copy.

Approved by:	re (jhb)
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-05-06 20:34:04 +00:00
Matthew N. Dodd
4957466b8e IP_RECVTTL socket option.
Reviewed by:	Stuart Cheshire <cheshire@apple.com>
2003-04-29 21:36:18 +00:00
Dag-Erling Smørgrav
fe58453891 Introduce an M_ASSERTPKTHDR() macro which performs the very common task
of asserting that an mbuf has a packet header.  Use it instead of hand-
rolled versions wherever applicable.

Submitted by:	Hiten Pandya <hiten@unixdaemons.com>
2003-04-08 14:25:47 +00:00
Matthew N. Dodd
2c56e246fa Back out support for RFC3514.
RFC3514 poses an unacceptale risk to compliant systems.
2003-04-02 20:14:44 +00:00
Matthew N. Dodd
8faf6df9b3 Sync constant define with NetBSD.
Requested by:	 Tom Spindler <dogcow@babymeat.com>
2003-04-02 10:28:47 +00:00
Matthew N. Dodd
09139a4537 Implement support for RFC 3514 (The Security Flag in the IPv4 Header).
(See: ftp://ftp.rfc-editor.org/in-notes/rfc3514.txt)

This fulfills the host requirements for userland support by
way of the setsockopt() IP_EVIL_INTENT message.

There are three sysctl tunables provided to govern system behavior.

	net.inet.ip.rfc3514:

		Enables support for rfc3514.  As this is an
		Informational RFC and support is not yet widespread
		this option is disabled by default.

	net.inet.ip.hear_no_evil

		 If set the host will discard all received evil packets.

	net.inet.ip.speak_no_evil

		If set the host will discard all transmitted evil packets.

The IP statistics counter 'ips_evil' (available via 'netstat') provides
information on the number of 'evil' packets recieved.

For reference, the '-E' option to 'ping' has been provided to demonstrate
and test the implementation.
2003-04-01 08:21:44 +00:00
Robert Watson
5e7ce4785f Modify the mac_init_ipq() MAC Framework entry point to accept an
additional flags argument to indicate blocking disposition, and
pass in M_NOWAIT from the IP reassembly code to indicate that
blocking is not OK when labeling a new IP fragment reassembly
queue.  This should eliminate some of the WITNESS warnings that
have started popping up since fine-grained IP stack locking
started going in; if memory allocation fails, the creation of
the fragment queue will be aborted.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-03-26 15:12:03 +00:00
Jonathan Lemon
1cafed3941 Update netisr handling; Each SWI now registers its queue, and all queue
drain routines are done by swi_net, which allows for better queue control
at some future point.  Packets may also be directly dispatched to a netisr
instead of queued, this may be of interest at some installations, but
currently defaults to off.

Reviewed by: hsu, silby, jayanth, sam
Sponsored by: DARPA, NAI Labs
2003-03-04 23:19:55 +00:00
Mike Silbersack
a75a485d62 Fix a condition so that ip reassembly queues are emptied immediately
when maxfragpackets is dropped to 0.

Noticed by:	bmah
2003-02-26 07:28:35 +00:00
Maxim Konovalov
b36f5b3735 style(9): join lines. 2003-02-25 11:53:11 +00:00
Maxim Konovalov
99e8617d24 Ip reassembly queue structure has ipq_nfrags now. Count a number of
dropped ip fragments precisely.

Reviewed by:	silby
2003-02-25 11:49:01 +00:00
Sam Leffler
14dd6717f8 Add a new config option IPSEC_FILTERGIF to control whether or not
packets coming out of a GIF tunnel are re-processed by ipfw, et. al.
By default they are not reprocessed.  With the option they are.

This reverts 1.214.  Prior to that change packets were not re-processed.
After they were which caused problems because packets do not have
distinguishing characteristics (like a special network if) that allows
them to be filtered specially.

This is really a stopgap measure designed for immediate MFC so that
4.8 has consistent handling to what was in 4.7.

PR:		48159
Reviewed by:	Guido van Rooij <guido@gvr.org>
MFC after:	1 day
2003-02-23 00:47:06 +00:00
Mike Silbersack
375386e284 Add the ability to limit the number of IP fragments allowed per packet,
and enable it by default, with a limit of 16.

At the same time, tweak maxfragpackets downward so that in the worst
possible case, IP reassembly can use only 1/2 of all mbuf clusters.

MFC after: 	3 days
Reviewed by:	hsu
Liked by:	bmah
2003-02-22 06:41:47 +00:00
Warner Losh
a163d034fa Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
Mike Silbersack
ecf44c01f4 Move a comment and optimize the frag timeout code a slight bit.
Submitted by:	maxim
MFC with:	The previous two revisions
2003-02-01 05:59:51 +00:00
Mike Silbersack
ac64c8668b A few fixes to rev 1.221
- Honor the previous behavior of maxfragpackets = 0 or -1
- Take a better stab at fragment statistics
- Move / correct a comment

Suggested by:	maxim@
MFC after:	7 days
2003-01-28 03:39:39 +00:00
Mike Silbersack
402062e80c Merge the best parts of maxfragpackets and maxnipq together. (Both
functions implemented approximately the same limits on fragment memory
usage, but in different fashions.)

End user visible changes:
- Fragment reassembly queues are freed in a FIFO manner when maxfragpackets
  has been reached, rather than all reassembly stopping.

MFC after: 	5 days
2003-01-26 01:44:05 +00:00
Alfred Perlstein
44956c9863 Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
Sam Leffler
9967cafc49 Correct mbuf packet header propagation. Previously, packet headers
were sometimes propagated using M_COPY_PKTHDR which actually did
something between a "move" and a  "copy" operation.  This is replaced
by M_MOVE_PKTHDR (which copies the pkthdr contents and "removes" it
from the source mbuf) and m_dup_pkthdr which copies the packet
header contents including any m_tag chain.  This corrects numerous
problems whereby mbuf tags could be lost during packet manipulations.

These changes also introduce arguments to m_tag_copy and m_tag_copy_chain
to specify if the tag copy work should potentially block.  This
introduces an incompatibility with openbsd which we may want to revisit.

Note that move/dup of packet headers does not handle target mbufs
that have a cluster bound to them.  We may want to support this;
for now we watch for it with an assert.

Finally, M_COPYFLAGS was updated to include M_FIRSTFRAG|M_LASTFRAG.

Supported by:	Vernier Networks
Reviewed by:	Robert Watson <rwatson@FreeBSD.org>
2002-12-30 20:22:40 +00:00
Luigi Rizzo
97850a5dd9 Move fw_one_pass from ip_fw2.c to ip_input.c so that neither
bridge.c nor if_ethersubr.c depend on IPFIREWALL.
Restore the use of fw_one_pass in if_ethersubr.c

ipfw.8 will be updated with a separate commit.

Approved by: re
2002-11-20 19:07:27 +00:00
Mike Silbersack
df285b3d1d Add a sysctl to control the generation of source quench packets,
and set it to 0 by default.

Partially obtained from:	NetBSD
Suggested by:	David Gilbert
MFC after:	5 days
2002-11-19 17:06:06 +00:00
Luigi Rizzo
bbb4330b61 Massive cleanup of the ip_mroute code.
No functional changes, but:

  + the mrouting module now should behave the same as the compiled-in
    version (it did not before, some of the rsvp code was not loaded
    properly);
  + netinet/ip_mroute.c is now truly optional;
  + removed some redundant/unused code;
  + changed many instances of '0' to NULL and INADDR_ANY as appropriate;
  + removed several static variables to make the code more SMP-friendly;
  + fixed some minor bugs in the mrouting code (mostly, incorrect return
    values from functions).

This commit is also a prerequisite to the addition of support for PIM,
which i would like to put in before DP2 (it does not change any of
the existing APIs, anyways).

Note, in the process we found out that some device drivers fail to
properly handle changes in IFF_ALLMULTI, leading to interesting
behaviour when a multicast router is started. This bug is not
corrected by this commit, and will be fixed with a separate commit.

Detailed changes:
--------------------
netinet/ip_mroute.c     all the above.
conf/files              make ip_mroute.c optional
net/route.c             fix mrt_ioctl hook
netinet/ip_input.c      fix ip_mforward hook, move rsvp_input() here
                        together with other rsvp code, and a couple
                        of indentation fixes.
netinet/ip_output.c     fix ip_mforward and ip_mcast_src hooks
netinet/ip_var.h        rsvp function hooks
netinet/raw_ip.c        hooks for mrouting and rsvp functions, plus
                        interface cleanup.
netinet/ip_mroute.h     remove an unused and optional field from a struct

Most of the code is from Pavlin Radoslavov and the XORP project

Reviewed by: sam
MFC after: 1 week
2002-11-15 22:53:53 +00:00
Poul-Henning Kamp
53be11f680 Fix two instances of variant struct definitions in sys/netinet:
Remove the never completed _IP_VHL version, it has not caught on
anywhere and it would make us incompatible with other BSD netstacks
to retain this version.

Add a CTASSERT protecting sizeof(struct ip) == 20.

Don't let the size of struct ipq depend on the IPDIVERT option.

This is a functional no-op commit.

Approved by:	re
2002-10-20 22:52:07 +00:00
Guido van Rooij
2f591ab8fe Get rid of checking for ip sec history. It is true that packets are not
supposed to be checked by the firewall rules twice. However, because the
various ipsec handlers never call ip_input(), this never happens anyway.

This fixes the situation where a gif tunnel is encrypted with IPsec. In
such a case, after IPsec processing, the unencrypted contents from the
GIF tunnel are fed back to the ipintrq and subsequently handeld by
ip_input(). Yet, since there still is IPSec history attached, the
packets coming out from the gif device are never fed into the filtering
code.
This fix was sent to Itojun, and he pointed towartds
    http://www.netbsd.org/Documentation/network/ipsec/#ipf-interaction.
This patch actually implements what is stated there (specifically:
Packet came from tunnel devices (gif(4) and ipip(4)) will still
go through ipf(4). You may need to identify these packets by
using interface name directive in ipf.conf(5).

Reviewed by:	rwatson
MFC after:	3 weeks
2002-10-16 09:01:48 +00:00
Sam Leffler
b9234fafa0 Tie new "Fast IPsec" code into the build. This involves the usual
configuration stuff as well as conditional code in the IPv4 and IPv6
areas.  Everything is conditional on FAST_IPSEC which is mutually
exclusive with IPSEC (KAME IPsec implmentation).

As noted previously, don't use FAST_IPSEC with INET6 at the moment.

Reviewed by:	KAME, rwatson
Approved by:	silence
Supported by:	Vernier Networks
2002-10-16 02:25:05 +00:00
Sam Leffler
5d84645305 Replace aux mbufs with packet tags:
o instead of a list of mbufs use a list of m_tag structures a la openbsd
o for netgraph et. al. extend the stock openbsd m_tag to include a 32-bit
  ABI/module number cookie
o for openbsd compatibility define a well-known cookie MTAG_ABI_COMPAT and
  use this in defining openbsd-compatible m_tag_find and m_tag_get routines
o rewrite KAME use of aux mbufs in terms of packet tags
o eliminate the most heavily used aux mbufs by adding an additional struct
  inpcb parameter to ip_output and ip6_output to allow the IPsec code to
  locate the security policy to apply to outbound packets
o bump __FreeBSD_version so code can be conditionalized
o fixup ipfilter's call to ip_output based on __FreeBSD_version

Reviewed by:	julian, luigi (silent), -arch, -net, darren
Approved by:	julian, silence from everyone else
Obtained from:	openbsd (mostly)
MFC after:	1 month
2002-10-16 01:54:46 +00:00
Maxim Konovalov
a5428e3a9a Fix IPOPT_TS processing: do not overwrite IP address by timestamp.
PR:		misc/42121
Submitted by:	Praveen Khurjekar <praveen@codito.com>
Reviewed by:	silence on -net
MFC after:	1 month
2002-10-10 12:03:36 +00:00
Poul-Henning Kamp
37c841831f Be consistent about "static" functions: if the function is marked
static in its prototype, mark it static at the definition too.

Inspired by:    FlexeLint warning #512
2002-09-28 17:15:38 +00:00
Poul-Henning Kamp
a5554bf05b Use m_fixhdr() rather than roll our own. 2002-09-18 19:43:01 +00:00
Maxim Konovalov
1cf4349926 Explicitly clear M_FRAG flag on a mbuf with the last fragment to unbreak
ip fragments reassembling for loopback interface.

Discussed with:	bde, jlemon
Reviewed by:	silence on -net
MFC after:	2 weeks
2002-09-17 11:20:02 +00:00
Luigi Rizzo
ea779ff36c Fix handling of packets which matched an "ipfw fwd" rule on the input side. 2002-08-03 14:59:45 +00:00
Robert Watson
e316463a86 When preserving the IP header in extra mbuf in the IP forwarding
case, also preserve the MAC label.  Note that this mbuf allocation
is fairly non-optimal, but not my fault.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-02 20:45:27 +00:00
Robert Watson
36b0360b37 Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument the code managing IP fragment reassembly queues (struct ipq)
to invoke appropriate MAC entry points to maintain a MAC label on
each queue.  Permit MAC policies to associate information with a queue
based on the mbuf that caused it to be created, update that information
based on further mbufs accepted by the queue, influence the decision
making process by which mbufs are accepted to the queue, and set the
label of the mbuf holding the reassembled datagram following reassembly
completetion.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 17:17:51 +00:00
Maxime Henrion
7627c6cbcc Warning fixes for 64 bits platforms. With this last fix,
I can build a GENERIC sparc64 kernel with -Werror.

Reviewed by:	luigi
2002-06-27 11:02:06 +00:00
Luigi Rizzo
4d2e36928d Move some global variables in more appropriate places.
Add XXX comments to mark places which need to be taken care of
if we want to remove this part of the kernel from Giant.

Add a comment on a potential performance problem with ip_forward()
2002-06-23 20:48:26 +00:00
Luigi Rizzo
51aed12e52 fix bad indentation and whitespace resulting from cut&paste 2002-06-23 09:15:43 +00:00
Luigi Rizzo
2b25acc158 Remove (almost all) global variables that were used to hold
packet forwarding state ("annotations") during ip processing.
The code is considerably cleaner now.

The variables removed by this change are:

        ip_divert_cookie        used by divert sockets
        ip_fw_fwd_addr          used for transparent ip redirection
        last_pkt                used by dynamic pipes in dummynet

Removal of the first two has been done by carrying the annotations
into volatile structs prepended to the mbuf chains, and adding
appropriate code to add/remove annotations in the routines which
make use of them, i.e. ip_input(), ip_output(), tcp_input(),
bdg_forward(), ether_demux(), ether_output_frame(), div_output().

On passing, remove a bug in divert handling of fragmented packet.
Now it is the fragment at offset 0 which sets the divert status of
the whole packet, whereas formerly it was the last incoming fragment
to decide.

Removal of last_pkt required a change in the interface of ip_fw_chk()
and dummynet_io(). On passing, use the same mechanism for dummynet
annotations and for divert/forward annotations.

option IPFIREWALL_FORWARD is effectively useless, the code to
implement it is very small and is now in by default to avoid the
obfuscation of conditionally compiled code.

NOTES:
 * there is at least one global variable left, sro_fwd, in ip_output().
   I am not sure if/how this can be removed.

 * I have deliberately avoided gratuitous style changes in this commit
   to avoid cluttering the diffs. Minor stule cleanup will likely be
   necessary

 * this commit only focused on the IP layer. I am sure there is a
   number of global variables used in the TCP and maybe UDP stack.

 * despite the number of files touched, there are absolutely no API's
   or data structures changed by this commit (except the interfaces of
   ip_fw_chk() and dummynet_io(), which are internal anyways), so
   an MFC is quite safe and unintrusive (and desirable, given the
   improved readability of the code).

MFC after: 10 days
2002-06-22 11:51:02 +00:00
Seigo Tanimura
4cc20ab1f0 Back out my lats commit of locking down a socket, it conflicts with hsu's work.
Requested by:	hsu
2002-05-31 11:52:35 +00:00
Andrew R. Reiter
db40007d42 - Change the newly turned INVARIANTS #ifdef blocks (they were changed from
DIAGNOSTIC yesterday) into KASSERT()'s as these help to increase code
  readability.
2002-05-21 18:52:24 +00:00
Andrew R. Reiter
e16f6e6200 - Turn a #ifdef DIAGNOSTIC to #ifdef INVARIANTS as the code from this line
through the #endif is really a sanity check.

Reviewed by: jake
2002-05-20 21:50:39 +00:00
Seigo Tanimura
243917fe3b Lock down a socket, milestone 1.
o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a
  socket buffer. The mutex in the receive buffer also protects the data
  in struct socket.

o Determine the lock strategy for each members in struct socket.

o Lock down the following members:

  - so_count
  - so_options
  - so_linger
  - so_state

o Remove *_locked() socket APIs.  Make the following socket APIs
  touching the members above now require a locked socket:

 - sodisconnect()
 - soisconnected()
 - soisconnecting()
 - soisdisconnected()
 - soisdisconnecting()
 - sofree()
 - soref()
 - sorele()
 - sorwakeup()
 - sotryfree()
 - sowakeup()
 - sowwakeup()

Reviewed by:	alfred
2002-05-20 05:41:09 +00:00
Dima Dorfman
11612afabe s/demon/daemon/ 2002-05-12 00:22:38 +00:00
Luigi Rizzo
d60315bef5 Cleanup the interface to ip_fw_chk, two of the input arguments
were totally useless and have been removed.

ip_input.c, ip_output.c:
    Properly initialize the "ip" pointer in case the firewall does an
    m_pullup() on the packet.

    Remove some debugging code forgotten long ago.

ip_fw.[ch], bridge.c:
    Prepare the grounds for matching MAC header fields in bridged packets,
    so we can have 'etherfw' functionality without a lot of kernel and
    userland bloat.
2002-05-09 10:34:57 +00:00
John Baldwin
6008862bc2 Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on:	i386, alpha, sparc64
2002-04-04 21:03:38 +00:00
Alfred Perlstein
4d77a549fe Remove __P. 2002-03-19 21:25:46 +00:00
Chris D. Faulhaber
546f251b29 Enforce inbound IPsec SPD
Reviewed by:	fenner
2002-02-26 02:11:13 +00:00
Mike Barcroft
fd8e4ebc8c o Move NTOHL() and associated macros into <sys/param.h>. These are
deprecated in favor of the POSIX-defined lowercase variants.
o Change all occurrences of NTOHL() and associated marcros in the
  source tree to use the lowercase function variants.
o Add missing license bits to sparc64's <machine/endian.h>.
  Approved by: jake
o Clean up <machine/endian.h> files.
o Remove unused __uint16_swap_uint32() from i386's <machine/endian.h>.
o Remove prototypes for non-existent bswapXX() functions.
o Include <machine/endian.h> in <arpa/inet.h> to define the
  POSIX-required ntohl() family of functions.
o Do similar things to expose the ntohl() family in libstand, <netinet/in.h>,
  and <sys/param.h>.
o Prepend underscores to the ntohl() family to help deal with
  complexities associated with having MD (asm and inline) versions, and
  having to prevent exposure of these functions in other headers that
  happen to make use of endian-specific defines.
o Create weak aliases to the canonical function name to help deal with
  third-party software forgetting to include an appropriate header.
o Remove some now unneeded pollution from <sys/types.h>.
o Add missing <arpa/inet.h> includes in userland.

Tested on:	alpha, i386
Reviewed by:	bde, jake, tmm
2002-02-18 20:35:27 +00:00
Crist J. Clark
56962689cc The ipfw(8) 'tee' action simply hasn't worked on incoming packets for
some time. _All_ packets, regardless of destination, were accepted by
the machine as if addressed to it.

Jump back to 'pass' processing for a teed packet instead of falling
through as if it was ours.

PR:		kern/31130
Reviewed by:	-net, luigi
MFC after:	2 weeks
2002-01-26 10:14:08 +00:00
Mike Smith
bedbd47e6a Initialise the intrq_present fields at runtime, not link time. This allows
us to load protocols at runtime, and avoids the use of common variables.

Also fix the ip6_intrq assignment so that it works at all.
2002-01-08 10:34:03 +00:00
Yaroslav Tykhiy
d0ebc0d2f1 Don't reveal a router in the IPSTEALTH mode through IP options.
The following steps are involved:
a) the IP options related to routing (LSRR and SSRR) are processed
   as though the router were a host,
b) the other IP options are processed as usual only if the packet
   is destined for the router; otherwise they are ignored.

PR:		kern/23123
Discussed in:	freebsd-hackers
2001-12-29 09:24:18 +00:00
Julian Elischer
3efc30142c Fix ipfw fwd so that it acts as the docs say
when forwarding an incoming packet to another machine.

Obtained from:	Vicor Production tree
MFC after: 3 weeks
2001-12-28 21:21:57 +00:00
Jonathan Lemon
6f00486cfd minor style and whitespace fixes. 2001-12-14 19:33:29 +00:00
Ruslan Ermilov
bd7142087b - Make ip_rtaddr() global, and use it to look up the correct source
address in icmp_reflect().
- Two new "struct icmpstat" members: icps_badaddr and icps_noroute.

PR:		kern/31575
Obtained from:	BSD/OS
MFC after:	1 week
2001-11-30 10:40:28 +00:00
Luigi Rizzo
7b109fa404 MFS: sync the ipfw/dummynet/bridge code with the one recently merged
into stable (mostly , but not only, formatting and comments changes).
2001-11-04 22:56:25 +00:00
Jonathan Lemon
0751407193 Don't use the ip_timestamp structure to access timestamp options, as the
compiler may cause an unaligned access to be generated in some cases.

PR: 30982
2001-10-25 06:27:51 +00:00
Paul Saab
db69a05dce Make it so dummynet and bridge can be loaded as modules.
Submitted by:	billf
2001-10-05 05:45:27 +00:00
Jonathan Lemon
ca925d9c17 Add a hash table that contains the list of internet addresses, and use
this in place of the in_ifaddr list when appropriate.  This improves
performance on hosts which have a large number of IP aliases.
2001-09-29 04:34:11 +00:00
Jonathan Lemon
9a10980e2a Centralize satosin(), sintosa() and ifatoia() macros in <netinet/in.h>
Remove local definitions.
2001-09-29 03:23:44 +00:00
Luigi Rizzo
830cc17841 Two main changes here:
+ implement "limit" rules, which permit to limit the number of sessions
   between certain host pairs (according to masks). These are a special
   type of stateful rules, which might be of interest in some cases.
   See the ipfw manpage for details.

 + merge the list pointers and ipfw rule descriptors in the kernel, so
   the code is smaller, faster and more readable. This patch basically
   consists in replacing "foo->rule->bar" with "rule->bar" all over
   the place.
   I have been willing to do this for ages!

MFC after: 1 week
2001-09-27 23:44:27 +00:00
Brooks Davis
9494d5968f Make faith loadable, unloadable, and clonable. 2001-09-25 18:40:52 +00:00
Jonathan Lemon
f9132cebdc Wrap array accesses in macros, which also happen to be lvalues:
ifnet_addrs[i - 1]  -> ifaddr_byindex(i)
        ifindex2ifnet[i]    -> ifnet_byindex(i)

This is intended to ease the conversion to SMPng.
2001-09-06 02:40:43 +00:00
Julian Elischer
f0ffb944d2 Patches from Keiichi SHIMA <keiichi@iij.ad.jp>
to make ip use the standard protosw structure again.

Obtained from: Well, KAME I guess.
2001-09-03 20:03:55 +00:00
Jesper Skriver
3b8123b72c When net.inet.tcp.icmp_may_rst is enabled, report ECONNREFUSED not ENETRESET
to the application as a RST would, this way we're compatible with the most
applications.

MFC candidate.

Submitted by:	Scott Renfro <scott@renfro.org>
Reviewed by:	Mike Silbersack <silby@silby.com>
2001-08-27 22:10:07 +00:00
Ruslan Ermilov
c73d99b567 Add netstat(1) knob to reset net.inet.{ip|icmp|tcp|udp|igmp}.stats.
For example, ``netstat -s -p ip -z'' will show and reset IP stats.

PR:		bin/17338
2001-06-23 17:17:59 +00:00
Hajimu UMEMOTO
3384154590 Sync with recent KAME.
This work was based on kame-20010528-freebsd43-snap.tgz and some
critical problem after the snap was out were fixed.
There are many many changes since last KAME merge.

TODO:
  - The definitions of SADB_* in sys/net/pfkeyv2.h are still different
    from RFC2407/IANA assignment because of binary compatibility
    issue.  It should be fixed under 5-CURRENT.
  - ip6po_m member of struct ip6_pktopts is no longer used.  But, it
    is still there because of binary compatibility issue.  It should
    be removed under 5-CURRENT.

Reviewed by:	itojun
Obtained from:	KAME
MFC after:	3 weeks
2001-06-11 12:39:29 +00:00
Jesper Skriver
96c2b04290 Make the default value of net.inet.ip.maxfragpackets and
net.inet6.ip6.maxfragpackets dependent on nmbclusters,
defaulting to nmbclusters / 4

Reviewed by:	bde
MFC after:	1 week
2001-06-10 11:04:10 +00:00
Jesper Skriver
690a6055ff Prevent denial of service using bogus fragmented IPv4 packets.
A attacker sending a lot of bogus fragmented packets to the target
(with different IPv4 identification field - ip_id), may be able
to put the target machine into mbuf starvation state.

By setting a upper limit on the number of reassembly queues we
prevent this situation.

This upper limit is controlled by the new sysctl
net.inet.ip.maxfragpackets which defaults to 200,
as the IPv6 case, this should be sufficient for most
systmes, but you might want to increase it if you have
lots of TCP sessions.
I'm working on making the default value dependent on
nmbclusters.

If you want old behaviour (no upper limit) set this sysctl
to a negative value.

If you don't want to accept any fragments (not recommended)
set the sysctl to 0 (zero).

Obtained from:	NetBSD
MFC after:	1 week
2001-06-03 23:33:23 +00:00
Kris Kennaway
64dddc1872 Add ``options RANDOM_IP_ID'' which randomizes the ID field of IP packets.
This closes a minor information leak which allows a remote observer to
determine the rate at which the machine is generating packets, since the
default behaviour is to increment a counter for each packet sent.

Reviewed by:    -net
Obtained from:  OpenBSD
2001-06-01 10:02:28 +00:00
David E. O'Brien
240ef84277 Back out jesper's 2001/05/31 14:58:11 PDT commit. It does not compile. 2001-06-01 09:51:14 +00:00
Jesper Skriver
2b1a209a17 Prevent denial of service using bogus fragmented IPv4 packets.
A attacker sending a lot of bogus fragmented packets to the target
(with different IPv4 identification field - ip_id), may be able
to put the target machine into mbuf starvation state.

By setting a upper limit on the number of reassembly queues we
prevent this situation.

This upper limit is controlled by the new sysctl
net.inet.ip.maxfragpackets which defaults to NMBCLUSTERS/4

If you want old behaviour (no upper limit) set this sysctl
to a negative value.

If you don't want to accept any fragments (not recommended)
set the sysctl to 0 (zero)

Obtained from:	NetBSD (partially)
MFC after:	1 week
2001-05-31 21:57:29 +00:00
Ruslan Ermilov
1e3d5af041 Invalidate cached forwarding route (ipforward_rt) whenever a new route
is added to the routing table, otherwise we may end up using the wrong
route when forwarding.

PR:		kern/10778
Reviewed by:	silence on -net
2001-03-19 09:16:16 +00:00
Ruslan Ermilov
4078ffb154 Make sure the cached forwarding route (ipforward_rt) is still up before
using it.  Not checking this may have caused the wrong IP address to be
used when processing certain IP options (see example below).  This also
caused the wrong route to be passed to ip_output() when forwarding, but
fortunately ip_output() is smart enough to detect this.

This example demonstrates the wrong behavior of the Record Route option
observed with this bug.  Host ``freebsd'' is acting as the gateway for
the ``sysv''.

1. On the gateway, we add the route to the destination.  The new route
   will use the primary address of the loopback interface, 127.0.0.1:

:  freebsd# route add 10.0.0.66 -iface lo0 -reject
:  add host 10.0.0.66: gateway lo0

2. From the client, we ping the destination.  We see the correct replies.
   Please note that this also causes the relevant route on the ``freebsd''
   gateway to be cached in ipforward_rt variable:

:  sysv# ping -snv 10.0.0.66
:  PING 10.0.0.66: 56 data bytes
:  ICMP Host Unreachable from gateway 192.168.0.115
:  ICMP Host Unreachable from gateway 192.168.0.115
:  ICMP Host Unreachable from gateway 192.168.0.115
:
:  ----10.0.0.66 PING Statistics----
:  3 packets transmitted, 0 packets received, 100% packet loss

3. On the gateway, we delete the route to the destination, thus making
   the destination reachable through the `default' route:

:  freebsd# route delete 10.0.0.66
:  delete host 10.0.0.66

4. From the client, we ping destination again, now with the RR option
   turned on.  The surprise here is the 127.0.0.1 in the first reply.
   This is caused by the bug in ip_rtaddr() not checking the cached
   route is still up befor use.  The debug code also shows that the
   wrong (down) route is further passed to ip_output().  The latter
   detects that the route is down, and replaces the bogus route with
   the valid one, so we see the correct replies (192.168.0.115) on
   further probes:

:  sysv# ping -snRv 10.0.0.66
:  PING 10.0.0.66: 56 data bytes
:  64 bytes from 10.0.0.66: icmp_seq=0. time=10. ms
:    IP options:  <record route> 127.0.0.1, 10.0.0.65, 10.0.0.66,
:                                192.168.0.65, 192.168.0.115, 192.168.0.120,
:                                0.0.0.0(Current), 0.0.0.0, 0.0.0.0
:  64 bytes from 10.0.0.66: icmp_seq=1. time=0. ms
:    IP options:  <record route> 192.168.0.115, 10.0.0.65, 10.0.0.66,
:                                192.168.0.65, 192.168.0.115, 192.168.0.120,
:                                0.0.0.0(Current), 0.0.0.0, 0.0.0.0
:  64 bytes from 10.0.0.66: icmp_seq=2. time=0. ms
:    IP options:  <record route> 192.168.0.115, 10.0.0.65, 10.0.0.66,
:                                192.168.0.65, 192.168.0.115, 192.168.0.120,
:                                0.0.0.0(Current), 0.0.0.0, 0.0.0.0
:
:  ----10.0.0.66 PING Statistics----
:  3 packets transmitted, 3 packets received, 0% packet loss
:  round-trip (ms)  min/avg/max = 0/3/10
2001-03-18 13:04:07 +00:00
Poul-Henning Kamp
462b86fe91 <sys/queue.h> makeover. 2001-03-16 20:00:53 +00:00
Ian Dowse
bfef7ed45c It was possible for ip_forward() to supply to icmp_error()
an IP header with ip_len in network byte order. For certain
values of ip_len, this could cause icmp_error() to write
beyond the end of an mbuf, causing mbuf free-list corruption.
This problem was observed during generation of ICMP redirects.

We now make quite sure that the copy of the IP header kept
for icmp_error() is stored in a non-shared mbuf header so
that it will not be modified by ip_output().

Also:
- Calculate the correct number of bytes that need to be
  retained for icmp_error(), instead of assuming that 64
  is enough (it's not).
- In icmp_error(), use m_copydata instead of bcopy() to
  copy from the supplied mbuf chain, in case the first 8
  bytes of IP payload are not stored directly after the IP
  header.
- Sanity-check ip_len in icmp_error(), and panic if it is
  less than sizeof(struct ip). Incoming packets with bad
  ip_len values are discarded in ip_input(), so this should
  only be triggered by bugs in the code, not by bad packets.

This patch results from code and suggestions from Ruslan, Bosko,
Jonathan Lemon and Matt Dillon, with important testing by Mike
Tancsa, who could reproduce this problem at will.

Reported by:	Mike Tancsa <mike@sentex.net>
Reviewed by:	ru, bmilekic, jlemon, dillon
2001-03-08 19:03:26 +00:00
Don Lewis
a8f1210095 Modify the comments to more closely resemble the English language. 2001-03-05 22:40:27 +00:00
Don Lewis
3f67c83439 Move the loopback net check closer to the beginning of ip_input() so that
it doesn't block packets whose destination address has been translated to
the loopback net by ipnat.

Add warning comments about the ip_checkinterface feature.
2001-03-05 08:45:05 +00:00
Don Lewis
e15ae1b226 Disable interface checking for packets subject to "ipfw fwd".
Chris Johnson <cjohnson@palomine.net> tested this fix in -stable.
2001-03-04 03:22:36 +00:00
Don Lewis
823db0e9dd Disable interface checking when IP forwarding is engaged so that packets
addressed to the interface on the other side of the box follow their
historical path.

Explicitly block packets sent to the loopback network sent from the outside,
which is consistent with the behavior of the forwarding path between
interfaces as implemented in in_canforward().

Always check the arrival interface when matching the packet destination
against the interface broadcast addresses.  This bug allowed TCP
connections to be made to the broadcast address of an interface on the
far side of the system because the M_BCAST flag was not set because the
packet was unicast to the interface on the near side.  This was broken
when the directed broadcast code was removed from revision 1.32.  If
the directed broadcast code was stil present, the destination would not
have been recognized as local until the packet was forwarded to the output
interface and ether_output() looped a copy back to ip_input() with
M_BCAST set and the receive interface set to the output interface.

Optimize the order of the tests.

Reviewed by:	jlemon
2001-03-04 01:39:19 +00:00
Jonathan Lemon
b3e95d4ed0 Add a new sysctl net.inet.ip.check_interface, which will verify that
an incoming packet arrivees on an interface that has an address matching
the packet's address.  This is turned on by default.
2001-03-02 20:54:03 +00:00
Jonathan Lemon
7538a9a0f8 When iterating over our list of interface addresses in order to determine
if an arriving packet belongs to us, also check that the packet arrived
through the correct interface.  Skip this check if the packet was locally
generated.
2001-02-27 19:43:14 +00:00
Jonathan Lemon
e4bb5b0572 Allow ICMP unreachables which map into PRC_UNREACH_ADMIN_PROHIB to
reset TCP connections which are in the SYN_SENT state, if the sequence
number in the echoed ICMP reply is correct.  This behavior can be
controlled by the sysctl net.inet.tcp.icmp_may_rst.

Currently, only subtypes 2,3,10,11,12 are treated as such
(port, protocol and administrative unreachables).

Assocaiate an error code with these resets which is reported to the
user application: ENETRESET.

Disallow resetting TCP sessions which are not in a SYN_SENT state.

Reviewed by: jesper, -net
2001-02-23 20:51:46 +00:00
Jesper Skriver
43c77c8f5f Backout change in 1.153, as it violate rfc1122 section 3.2.1.3.
Requested by:	jlemon,ru
2001-02-21 16:59:47 +00:00
Jesper Skriver
2b18d82220 Send a ICMP unreachable instead of dropping the packet silent, if we
receive a packet not for us, and forwarding disabled.

PR:		kern/24512
Reviewed by:	jlemon
Approved by:	jlemon
2001-02-20 21:31:47 +00:00
Poul-Henning Kamp
37d4006626 Another round of the <sys/queue.h> FOREACH transmogriffer.
Created with:   sed(1)
Reviewed by:    md5(1)
2001-02-04 16:08:18 +00:00
Poul-Henning Kamp
fc2ffbe604 Mechanical change to use <sys/queue.h> macro API instead of
fondling implementation details.

Created with: sed(1)
Reviewed by: md5(1)
2001-02-04 13:13:25 +00:00
Luigi Rizzo
507b4b5432 MFS: bridge/ipfw/dummynet fixes (bridge.c will be committed separately) 2001-02-02 00:18:00 +00:00
Jonathan Lemon
df5e198723 Lock down the network interface queues. The queue mutex must be obtained
before adding/removing packets from the queue.  Also, the if_obytes and
if_omcasts fields should only be manipulated under protection of the mutex.

IF_ENQUEUE, IF_PREPEND, and IF_DEQUEUE perform all necessary locking on
the queue.  An IF_LOCK macro is provided, as well as the old (mutex-less)
versions of the macros in the form _IF_ENQUEUE, _IF_QFULL, for code which
needs them, but their use is discouraged.

Two new macros are introduced: IF_DRAIN() to drain a queue, and IF_HANDOFF,
which takes care of locking/enqueue, and also statistics updating/start
if necessary.
2000-11-25 07:35:38 +00:00
Ruslan Ermilov
60123168be Wrong checksum used for certain reassembled IP packets before diverting. 2000-11-01 11:21:45 +00:00
Poul-Henning Kamp
46aa3347cb Convert all users of fldoff() to offsetof(). fldoff() is bad
because it only takes a struct tag which makes it impossible to
use unions, typedefs etc.

Define __offsetof() in <machine/ansi.h>

Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h>

Remove myriad of local offsetof() definitions.

Remove includes of <stddef.h> in kernel code.

NB: Kernelcode should *never* include from /usr/include !

Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API.

Deprecate <struct.h> with a warning.  The warning turns into an error on
01-12-2000 and the file gets removed entirely on 01-01-2001.

Paritials reviews by:   various.
Significant brucifications by:  bde
2000-10-27 11:45:49 +00:00
Ruslan Ermilov
b6ea1aa58d RFC 791 says that IP_RF bit should always be zero, but nothing
in the code enforces this.  So, do not check for and attempt a
false reassembly if only IP_RF is set.

Also, removed the dead code, since we no longer use dtom() on
return from ip_reass().
2000-10-26 13:14:48 +00:00
Ruslan Ermilov
7e2df4520d Wrong header length used for certain reassembled IP packets.
This was first fixed in rev 1.82 but then broken in rev 1.125.

PR:		6177
2000-10-26 12:18:13 +00:00
Josef Karthauser
5da9f8fa97 Augment the 'ifaddr' structure with a 'struct if_data' to keep
statistics on a per network address basis.

Teach the IPv4 and IPv6 input/output routines to log packets/bytes
against the network address connected to the flow.

Teach netstat to display the per-address stats for IP protocols
when 'netstat -i' is evoked, instead of displaying the per-interface
stats.
2000-10-19 23:15:54 +00:00
Ruslan Ermilov
487bdb3855 Backout my wrong attempt to fix the compilation warning in ip_input.c
and instead reapply the revision 1.49 of mbuf.h, i.e.

Fixed regression of the type of the `header' member of struct pkthdr from
`void *' to caddr_t in rev.1.51.  This mainly caused an annoying warning
for compiling ip_input.c.

Requested by:	bde
2000-10-12 16:33:41 +00:00
Ruslan Ermilov
e6c89c1bd2 Fix the compilation warning. 2000-10-12 10:42:32 +00:00
Jonathan Lemon
a8db1d93f1 m_cat() can free its second argument, so collect the checksum information
from the fragment before calling m_cat().
2000-09-14 21:06:48 +00:00
Ruslan Ermilov
e30177e024 Follow BSD/OS and NetBSD, keep the ip_id field in network order all the time.
Requested by:	wollman
2000-09-14 14:42:04 +00:00
Ruslan Ermilov
04287599db Fixed broken ICMP error generation, unified conversion of IP header
fields between host and network byte order.  The details:

o icmp_error() now does not add IP header length.  This fixes the problem
  when icmp_error() is called from ip_forward().  In this case the ip_len
  of the original IP datagram returned with ICMP error was wrong.

o icmp_error() expects all three fields, ip_len, ip_id and ip_off in host
  byte order, so DTRT and convert these fields back to network byte order
  before sending a message.  This fixes the problem described in PR 16240
  and PR 20877 (ip_id field was returned in host byte order).

o ip_ttl decrement operation in ip_forward() was moved down to make sure
  that it does not corrupt the copy of original IP datagram passed later
  to icmp_error().

o A copy of original IP datagram in ip_forward() was made a read-write,
  independent copy.  This fixes the problem I first reported to Garrett
  Wollman and Bill Fenner and later put in audit trail of PR 16240:
  ip_output() (not always) converts fields of original datagram to network
  byte order, but because copy (mcopy) and its original (m) most likely
  share the same mbuf cluster, ip_output()'s manipulations on original
  also corrupted the copy.

o ip_output() now expects all three fields, ip_len, ip_off and (what is
  significant) ip_id in host byte order.  It was a headache for years that
  ip_id was handled differently.  The only compatibility issue here is the
  raw IP socket interface with IP_HDRINCL socket option set and a non-zero
  ip_id field, but ip.4 manual page was unclear on whether in this case
  ip_id field should be in host or network byte order.
2000-09-01 12:33:03 +00:00
Andrey A. Chernov
c85540dd55 Nonexistent <sys/pfil.h> -> <net/pfil.h>
Kernel 'make depend' fails otherwise
2000-07-31 23:41:47 +00:00
Darren Reed
c4ac87ea1c activate pfil_hooks and covert ipfilter to use it 2000-07-31 13:11:42 +00:00
Jun-ichiro itojun Hagino
686cdd19b1 sync with kame tree as of july00. tons of bug fixes/improvements.
API changes:
- additional IPv6 ioctls
- IPsec PF_KEY API was changed, it is mandatory to upgrade setkey(8).
  (also syntax change)
2000-07-04 16:35:15 +00:00
Jonathan Lemon
707d00a304 Add boundary checks against IP options.
Obtained from:	OpenBSD
2000-06-02 20:18:38 +00:00
Jonathan Lemon
5d5d5fc0bf Cast sizeof() calls to be of type (int) when they appear in a signed
integer expression.  Otherwise the sizeof() call will force the expression
to be evaluated as unsigned, which is not the intended behavior.

Obtained from:  NetBSD   (in a different form)
2000-05-17 04:05:07 +00:00
Ruslan Ermilov
3a06e3e02c Do not call icmp_error() if ipfirewall(4) denied packet.
PR:		kern/10747, kern/18382
2000-05-15 18:41:01 +00:00
Jun-ichiro itojun Hagino
fdcb8debf6 correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.
similar to recent fix to sys/netinet/ipf.c (by darren).
2000-05-10 01:25:33 +00:00