vnodes. This will hopefully serve as a base from which we can
expand the MP code. We currently do not attempt to obtain any
mutex or SX locks, but the door is open to add them when we nail
down exactly how that part of it is going to work.
sysctl_req', which describes in-progress sysctl requests. This permits
sysctl handlers to have access to the current thread, permitting work
on implementing td->td_ucred, migration of suser() to using struct
thread to derive the appropriate ucred, and allowing struct thread to be
passed down to other code, such as network code where td is not currently
available (and curproc is used).
o Note: netncp and netsmb are not updated to reflect this change, as they
are not currently KSE-adapted.
Reviewed by: julian
Obtained from: TrustedBSD Project
"[...] and removes the hostcache code from standard kernels---the
code that depends on it is not going to happen any time soon,
I'm afraid."
Time to clean up.
called and ip_output() encounters an error and bails (i.e. host
unreachable), we will leak an mbuf. This is because the code calls
m_freem(m0) after jumping to the bad: label at the end of the function,
when it should be calling m_freem(m). (m0 is the original mbuf list
_without_ the options mbuf prepended.)
Obtained from: NetBSD
ifconfig, which expects the address returned by the SIOCGIFNETMASK ioctl
to have a valid sa_family. Similar changes may be necessary for IPv6.
While we're here, get rid of an unnecessary temp variable.
MFC after: 2 weeks
is calculated. this caused some trouble in the code which the ip header
is not modified. for example, inbound policy lookup failed.
Obtained from: KAME
MFC after: 1 week
Have sys/net/route.c:rtrequest1(), which takes ``rt_addrinfo *''
as the argument. Pass rt_addrinfo all the way down to rtrequest1
and ifa->ifa_rtrequest. 3rd argument of ifa->ifa_rtrequest is now
``rt_addrinfo *'' instead of ``sockaddr *'' (almost noone is
using it anyways).
Benefit: the following command now works. Previously we needed
two route(8) invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0
Remove unsafe typecast in rtrequest(), from ``rtentry *'' to
``sockaddr *''. It was introduced by 4.3BSD-Reno and never
corrected.
Obtained from: BSD/OS, NetBSD
MFC after: 1 month
PR: kern/28360
a single kern.security.seeotheruids_permitted, describes as:
"Unprivileged processes may see subjects/objects with different real uid"
NOTE: kern.ps_showallprocs exists in -STABLE, and therefore there is
an API change. kern.ipc.showallsockets does not.
- Check kern.security.seeotheruids_permitted in cr_cansee().
- Replace visibility calls to socheckuid() with cr_cansee() (retain
the change to socheckuid() in ipfw, where it is used for rule-matching).
- Remove prison_unpcb() and make use of cr_cansee() against the UNIX
domain socket credential instead of comparing root vnodes for the
UDS and the process. This allows multiple jails to share the same
chroot() and not see each others UNIX domain sockets.
- Remove unused socheckproc().
Now that cr_cansee() is used universally for socket visibility, a variety
of policies are more consistently enforced, including uid-based
restrictions and jail-based restrictions. This also better-supports
the introduction of additional MAC models.
Reviewed by: ps, billf
Obtained from: TrustedBSD Project
to send all its data, especially when the data is less than one MSS.
This fixes an issue where the stack was delaying the sending
of data, eventhough there was enough window to send all the data and
the sending of data was emptying the socket buffer.
Problem found by Yoshihiro Tsuchiya (tsuchiya@flab.fujitsu.co.jp)
Submitted by: Jayanth Vijayaraghavan
kern.ipc.showallsockets is set to 0.
Submitted by: billf (with modifications by me)
Inspired by: Dave McKay (aka pm aka Packet Magnet)
Reviewed by: peter
MFC after: 2 weeks
+ implement "limit" rules, which permit to limit the number of sessions
between certain host pairs (according to masks). These are a special
type of stateful rules, which might be of interest in some cases.
See the ipfw manpage for details.
+ merge the list pointers and ipfw rule descriptors in the kernel, so
the code is smaller, faster and more readable. This patch basically
consists in replacing "foo->rule->bar" with "rule->bar" all over
the place.
I have been willing to do this for ages!
MFC after: 1 week
not referenced in Stevens, and does not compile with g++.
There is an equivalent structure, struct ipoption in ip_var.h
which is actually used in various parts of the kernel, and also referenced
in Stevens.
Bill Fenner also says:
... if you want the trivia, struct ip_opts was introduced
in in.h SCCS revision 7.9, on 6/28/1990, by Mike Karels.
struct ipoption was introduced in ip_var.h SCCS revision 6.5,
on 9/16/1985, by... Mike Karels.
MFC-after: 3 days
NAT in extended passive mode if the server's public IP address was
different from the main NAT address. This caused a wrong aliasing
link to be created that did not route the incoming packets back to
the original IP address of the server.
natd -v -n pub0 -redirect_address localFTP publicFTP
Note that even if localFTP == publicFTP, one still needs to supply
the -redirect_address directive. It is needed as a helper because
extended passive mode's 229 reply does not contain the IP address.
MFC after: 1 week
and speed. No new functionality added (yet) apart from a bugfix.
MFC will occur in due time and probably in stages.
BUGFIX: fix a problem in old code which prevented reallocation of
the hash table for dynamic rules (there is a PR on this).
OTHER CHANGES: minor changes to the internal struct for static and dynamic rules.
Requires rebuild of ipfw binary.
Add comments to show how data structures are linked together.
(It probably makes no sense to keep the chain pointers separate
from actual rule descriptors. They will be hopefully merged soon.
keep a (sysctl-readable) counter for the number of static rules,
to speed up IP_FW_GET operations
initial support for a "grace time" for expired connections, so we
can set timeouts for closing connections to much shorter times.
merge zero_entry() and resetlog_entry(), they use basically the
same code.
clean up and reduce replication of code for removing rules,
both for readability and code size.
introduce a separate lifetime for dynamic UDP rules.
fix a problem in old code which prevented reallocation of
the hash table for dynamic rules (PR ...)
restructure dynamic rule descriptors
introduce some local variables to avoid multiple dereferencing of
pointer chains (reduces code size and hopefully increases speed).
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
new data is acknowledged, reset the dupacks to 0.
The problem was spotted when a connection had its send buffer full
because the congestion window was only 1 MSS and was not being incremented
because dupacks was not reset to 0.
Obtained from: Yahoo!
to the application as a RST would, this way we're compatible with the most
applications.
MFC candidate.
Submitted by: Scott Renfro <scott@renfro.org>
Reviewed by: Mike Silbersack <silby@silby.com>
about rules and dynamic rules. it later fills this buffer with these
rules.
it also takes the opporunity to compare the expiration of the dynamic
rules with the current time and either marks them for deletion or simply
charges the countdown.
unfortunatly it does this all (the sizing, the buffer copying, and the
expiration GC) with no spl protection whatsoever. it was possible for
the dynamic rule(s) to be ripped out from under the request before it
had completed, resulting in corrupt memory dereferencing.
Reviewed by: ps
MFC before: 4.4-RELEASE, hopefully.
In order to ensure security and functionality, RFC 1948 style
initial sequence number generation has been implemented. Barring
any major crypographic breakthroughs, this algorithm should be
unbreakable. In addition, the problems with TIME_WAIT recycling
which affect our currently used algorithm are not present.
Reviewed by: jesper
cdevsw entries have been for a long time.
Discover that we now have two version sof the same structure.
I will shoot one of them shortly when I figure out why someone thinks
they need it. (And I can prove they don't)
(netinet/ipprotosw.h should GO AWAY)
Avoid using parenthesis enclosure macros (.Pq and .Po/.Pc) with plain text.
Not only this slows down the mdoc(7) processing significantly, but it also
has an undesired (in this case) effect of disabling hyphenation within the
entire enclosed block.
making pcbs available to the outside world. otherwise, we will see
inpcb without ipsec security policy attached (-> panic() in ipsec.c).
Obtained from: KAME
MFC after: 3 days
- Use sysctl to export stats
- Use ip_encap.c's encapsulation support
- Update lkm to kld (is 6 years a record for a broken module?)
- Remove some unused cruft
This macro was supposed to only match local IP addresses of
interfaces, and all consumers of this macro assume this as
well. (See IP_MULTICAST_IF and IP_ADD_MEMBERSHIP socket
options in the ip(4) manpage.)
This fixes a major security breach in IPFW-based firewalls
where the `me' keyword would match the other end of a P2P
link.
PR: kern/28567
This should help us in nieve benchmark "tests".
It seems a wide number of people think 32k buffers would not cause major
issues, and is in fact in use by many other OS's at this time. The
receive buffers can be bumped higher as buffers are hardly used and several
research papers indicate that receive buffers rarely use much space at all.
Submitted by: Leo Bicknell <bicknell@ufp.org>
<20010713101107.B9559@ussenterprise.ufp.org>
Agreed to in principle by: dillon (at the 32k level)
generation scheme. Users may now select between the currently used
OpenBSD algorithm and the older random positive increment method.
While the OpenBSD algorithm is more secure, it also breaks TIME_WAIT
handling; this is causing trouble for an increasing number of folks.
To switch between generation schemes, one sets the sysctl
net.inet.tcp.tcp_seq_genscheme. 0 = random positive increments,
1 = the OpenBSD algorithm. 1 is still the default.
Once a secure _and_ compatible algorithm is implemented, this sysctl
will be removed.
Reviewed by: jlemon
Tested by: numerous subscribers of -net
RTF_DYNAMIC route, it got freed twice). I am not sure what was
the actual problem in 1992, but the current behavior is memory
leak if PCB holds a reference to a dynamically created/modified
routing table entry. (rt_refcnt>0 and we don't call rtfree().)
My test bed was:
1. Set net.inet.tcp.msl to a low value (for test purposes), e.g.,
5 seconds, to speed up the transition of TCP connection to a
"closed" state.
2. Add a network route which causes ICMP redirect from the gateway.
3. ping(8) host H that matches this route; this creates RTF_DYNAMIC
RTF_HOST route to H. (I was forced to use ICMP to cause gateway
to generate ICMP host redirect, because gateway in question is a
4.2-STABLE system vulnerable to a problem that was fixed later in
ip_icmp.c,v 1.39.2.6, and TCP packets with DF bit set were
triggering this bug.)
4. telnet(1) to H
5. Block access to H with ipfw(8)
6. Send something in telnet(1) session; this causes EPERM, followed
by an in_losing() call in a few seconds.
7. Delete ipfw(8) rule blocking access to H, and wait for TCP
connection moving to a CLOSED state; PCB is freed.
8. Delete host route to H.
9. Watch with netstat(1) that `rttrash' increased.
10. Repeat steps 3-9, and watch `rttrash' increases.
PR: kern/25421
MFC after: 2 weeks
only do getcred calls for sockets which were created in the same jail.
This should allow the ident to work in a reasonable way within jails.
PR: 28107
Approved by: des, rwatson
connection. The information contained in a tcptemp can be
reconstructed from a tcpcb when needed.
Previously, tcp templates required the allocation of one
mbuf per connection. On large systems, this change should
free up a large number of mbufs.
Reviewed by: bmilekic, jlemon, ru
MFC after: 2 weeks
are duplicated by newly defined types/options in RFC3121
- We have no backward compatibility issue. There is no apps in our
distribution which use the above types/options.
Obtained from: KAME
MFC after: 2 weeks
sizeof(ro_dst) is not necessarily the correct one.
this change would also fix the recent path MTU discovery problem for the
destination of an incoming TCP connection.
Submitted by: JINMEI Tatuya <jinmei@kame.net>
Obtained from: KAME
MFC after: 2 weeks
This work was based on kame-20010528-freebsd43-snap.tgz and some
critical problem after the snap was out were fixed.
There are many many changes since last KAME merge.
TODO:
- The definitions of SADB_* in sys/net/pfkeyv2.h are still different
from RFC2407/IANA assignment because of binary compatibility
issue. It should be fixed under 5-CURRENT.
- ip6po_m member of struct ip6_pktopts is no longer used. But, it
is still there because of binary compatibility issue. It should
be removed under 5-CURRENT.
Reviewed by: itojun
Obtained from: KAME
MFC after: 3 weeks
around, use a common function for looking up and extracting the tunables
from the kernel environment. This saves duplicating the same function
over and over again. This way typically has an overhead of 8 bytes + the
path string, versus about 26 bytes + the path string.
One way we can reduce the amount of traffic we send in response to a SYN
flood is to eliminate the RST we send when removing a connection from
the listen queue. Since we are being flooded, we can assume that the
majority of connections in the queue are bogus. Our RST is unwanted
by these hosts, just as our SYN-ACK was. Genuine connection attempts
will result in hosts responding to our SYN-ACK with an ACK packet. We
will automatically return a RST response to their ACK when it gets to us
if the connection has been dropped, so the early RST doesn't serve the
genuine class of connections much. In summary, we can reduce the number
of packets we send by a factor of two without any loss in functionality
by ensuring that RST packets are not sent when dropping a connection
from the listen queue.
Submitted by: Mike Silbersack <silby@silby.com>
Reviewed by: jesper
MFC after: 2 weeks
A attacker sending a lot of bogus fragmented packets to the target
(with different IPv4 identification field - ip_id), may be able
to put the target machine into mbuf starvation state.
By setting a upper limit on the number of reassembly queues we
prevent this situation.
This upper limit is controlled by the new sysctl
net.inet.ip.maxfragpackets which defaults to 200,
as the IPv6 case, this should be sufficient for most
systmes, but you might want to increase it if you have
lots of TCP sessions.
I'm working on making the default value dependent on
nmbclusters.
If you want old behaviour (no upper limit) set this sysctl
to a negative value.
If you don't want to accept any fragments (not recommended)
set the sysctl to 0 (zero).
Obtained from: NetBSD
MFC after: 1 week
This closes a minor information leak which allows a remote observer to
determine the rate at which the machine is generating packets, since the
default behaviour is to increment a counter for each packet sent.
Reviewed by: -net
Obtained from: OpenBSD
A attacker sending a lot of bogus fragmented packets to the target
(with different IPv4 identification field - ip_id), may be able
to put the target machine into mbuf starvation state.
By setting a upper limit on the number of reassembly queues we
prevent this situation.
This upper limit is controlled by the new sysctl
net.inet.ip.maxfragpackets which defaults to NMBCLUSTERS/4
If you want old behaviour (no upper limit) set this sysctl
to a negative value.
If you don't want to accept any fragments (not recommended)
set the sysctl to 0 (zero)
Obtained from: NetBSD (partially)
MFC after: 1 week
any response to our third SYN to work-around some broken
terminal servers (most of which have hopefully been retired)
that have bad VJ header compression code which trashes TCP
segments containing unknown-to-them TCP options.
PR: kern/1689
Submitted by: jesper
Reviewed by: wollman
MFC after: 2 weeks
For FTP control connection, keep the CRLF end-of-line termination
status in there.
Fixed the bug when the first FTP command in a session was ignored.
PR: 24048
MFC after: 1 week
other "system" header files.
Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.
Sort sys/*.h includes where possible in affected files.
OK'ed by: bde (with reservations)
Change code from PRC_UNREACH_ADMIN_PROHIB to PRC_UNREACH_PORT for
ICMP_UNREACH_PROTOCOL and ICMP_UNREACH_PORT
And let TCP treat PRC_UNREACH_PORT like PRC_UNREACH_ADMIN_PROHIB
This should fix the case where port unreachables for udp returned
ENETRESET instead of ECONNREFUSED
Problem found by: Bill Fenner <fenner@research.att.com>
Reviewed by: jlemon
sysctl, net.inet.ip.fw.permanent_rules.
This allows you to install rules that are persistent across flushes,
which is very useful if you want a default set of rules that
maintains your access to remote machines while you're reconfiguring
the other rules.
Reviewed by: Mark Murray <markm@FreeBSD.org>
very specific scenarios, and now that we have had net.inet.tcp.blackhole for
quite some time there is really no reason to use it any more.
(last of three commits)
using it. Not checking this may have caused the wrong IP address to be
used when processing certain IP options (see example below). This also
caused the wrong route to be passed to ip_output() when forwarding, but
fortunately ip_output() is smart enough to detect this.
This example demonstrates the wrong behavior of the Record Route option
observed with this bug. Host ``freebsd'' is acting as the gateway for
the ``sysv''.
1. On the gateway, we add the route to the destination. The new route
will use the primary address of the loopback interface, 127.0.0.1:
: freebsd# route add 10.0.0.66 -iface lo0 -reject
: add host 10.0.0.66: gateway lo0
2. From the client, we ping the destination. We see the correct replies.
Please note that this also causes the relevant route on the ``freebsd''
gateway to be cached in ipforward_rt variable:
: sysv# ping -snv 10.0.0.66
: PING 10.0.0.66: 56 data bytes
: ICMP Host Unreachable from gateway 192.168.0.115
: ICMP Host Unreachable from gateway 192.168.0.115
: ICMP Host Unreachable from gateway 192.168.0.115
:
: ----10.0.0.66 PING Statistics----
: 3 packets transmitted, 0 packets received, 100% packet loss
3. On the gateway, we delete the route to the destination, thus making
the destination reachable through the `default' route:
: freebsd# route delete 10.0.0.66
: delete host 10.0.0.66
4. From the client, we ping destination again, now with the RR option
turned on. The surprise here is the 127.0.0.1 in the first reply.
This is caused by the bug in ip_rtaddr() not checking the cached
route is still up befor use. The debug code also shows that the
wrong (down) route is further passed to ip_output(). The latter
detects that the route is down, and replaces the bogus route with
the valid one, so we see the correct replies (192.168.0.115) on
further probes:
: sysv# ping -snRv 10.0.0.66
: PING 10.0.0.66: 56 data bytes
: 64 bytes from 10.0.0.66: icmp_seq=0. time=10. ms
: IP options: <record route> 127.0.0.1, 10.0.0.65, 10.0.0.66,
: 192.168.0.65, 192.168.0.115, 192.168.0.120,
: 0.0.0.0(Current), 0.0.0.0, 0.0.0.0
: 64 bytes from 10.0.0.66: icmp_seq=1. time=0. ms
: IP options: <record route> 192.168.0.115, 10.0.0.65, 10.0.0.66,
: 192.168.0.65, 192.168.0.115, 192.168.0.120,
: 0.0.0.0(Current), 0.0.0.0, 0.0.0.0
: 64 bytes from 10.0.0.66: icmp_seq=2. time=0. ms
: IP options: <record route> 192.168.0.115, 10.0.0.65, 10.0.0.66,
: 192.168.0.65, 192.168.0.115, 192.168.0.120,
: 0.0.0.0(Current), 0.0.0.0, 0.0.0.0
:
: ----10.0.0.66 PING Statistics----
: 3 packets transmitted, 3 packets received, 0% packet loss
: round-trip (ms) min/avg/max = 0/3/10
A route generated from an RTF_CLONING route had the RTF_WASCLONED flag
set but did not have a reference to the parent route, as documented in
the rtentry(9) manpage. This prevented such routes from being deleted
when their parent route is deleted.
Now, for example, if you delete an IP address from a network interface,
all ARP entries that were cloned from this interface route are flushed.
This also has an impact on netstat(1) output. Previously, dynamically
created ARP cache entries (RTF_STATIC flag is unset) were displayed as
part of the routing table display (-r). Now, they are only printed if
the -a option is given.
netinet/in.c, netinet/in_rmx.c:
When address is removed from an interface, also delete all routes that
point to this interface and address. Previously, for example, if you
changed the address on an interface, outgoing IP datagrams might still
use the old address. The only solution was to delete and re-add some
routes. (The problem is easily observed with the route(8) command.)
Note, that if the socket was already bound to the local address before
this address is removed, new datagrams generated from this socket will
still be sent from the old address.
PR: kern/20785, kern/21914
Reviewed by: wollman (the idea)
is transmitted as all ones". This got broken after introduction
of delayed checksums as follows. Some guys (including Jonathan)
think that it is allowed to transmit all ones in place of a zero
checksum for TCP the same way as for UDP. (The discussion still
takes place on -net.) Thus, the 0 -> 0xffff checksum fixup was
first moved from udp_output() (see udp_usrreq.c, 1.64 -> 1.65)
to in_cksum_skip() (see sys/i386/i386/in_cksum.c, 1.17 -> 1.18,
INVERT expression). Besides that I disagree that it is valid for
TCP, there was no real problem until in_cksum.c,v 1.20, where the
in_cksum() was made just a special version of in_cksum_skip().
The side effect was that now every incoming IP datagram failed to
pass the checksum test (in_cksum() returned 0xffff when it should
actually return zero). It was fixed next day in revision 1.21,
by removing the INVERT expression. The latter also broke the
0 -> 0xffff fixup for UDP checksums.
Before this change:
: tcpdump: listening on lo0
: 127.0.0.1.33005 > 127.0.0.1.33006: udp 0 (ttl 64, id 1)
: 4500 001c 0001 0000 4011 7cce 7f00 0001
: 7f00 0001 80ed 80ee 0008 0000
After this change:
: tcpdump: listening on lo0
: 127.0.0.1.33005 > 127.0.0.1.33006: udp 0 (ttl 64, id 1)
: 4500 001c 0001 0000 4011 7cce 7f00 0001
: 7f00 0001 80ed 80ee 0008 ffff
come from a dummynet pipe. Without this, the code which increments
the per-ifaddr stats can dereference an uninitialised pointer. This
should make dummynet usable again.
Reported by: "Dmitry A. Yanko" <fm@astral.ntu-kpi.kiev.ua>
Reviewed by: luigi, joe
on certain types of SOCK_RAW sockets. Also, use the ip.ttl MIB
variable instead of MAXTTL constant as the default time-to-live
value for outgoing IP packets all over the place, as we already
do this for TCP and UDP.
Reviewed by: wollman
an IP header with ip_len in network byte order. For certain
values of ip_len, this could cause icmp_error() to write
beyond the end of an mbuf, causing mbuf free-list corruption.
This problem was observed during generation of ICMP redirects.
We now make quite sure that the copy of the IP header kept
for icmp_error() is stored in a non-shared mbuf header so
that it will not be modified by ip_output().
Also:
- Calculate the correct number of bytes that need to be
retained for icmp_error(), instead of assuming that 64
is enough (it's not).
- In icmp_error(), use m_copydata instead of bcopy() to
copy from the supplied mbuf chain, in case the first 8
bytes of IP payload are not stored directly after the IP
header.
- Sanity-check ip_len in icmp_error(), and panic if it is
less than sizeof(struct ip). Incoming packets with bad
ip_len values are discarded in ip_input(), so this should
only be triggered by bugs in the code, not by bad packets.
This patch results from code and suggestions from Ruslan, Bosko,
Jonathan Lemon and Matt Dillon, with important testing by Mike
Tancsa, who could reproduce this problem at will.
Reported by: Mike Tancsa <mike@sentex.net>
Reviewed by: ru, bmilekic, jlemon, dillon
it doesn't block packets whose destination address has been translated to
the loopback net by ipnat.
Add warning comments about the ip_checkinterface feature.
However, if the RTF_DELCLONE and RTF_WASCLONED condition passes, but the ref
count is > 1, we won't decrement the count at all. This could lead to
route entries never being deleted.
Here, we call rtfree() not only if the initial two conditions fail, but
also if the ref count is > 1 (and we therefore don't immediately delete
the route, but let rtfree() handle it).
This is an urgent MFC candidate. Thanks go to Mike Silbersack for the
fix, once again. :-)
Submitted by: Mike Silbersack <silby@silby.com>
addressed to the interface on the other side of the box follow their
historical path.
Explicitly block packets sent to the loopback network sent from the outside,
which is consistent with the behavior of the forwarding path between
interfaces as implemented in in_canforward().
Always check the arrival interface when matching the packet destination
against the interface broadcast addresses. This bug allowed TCP
connections to be made to the broadcast address of an interface on the
far side of the system because the M_BCAST flag was not set because the
packet was unicast to the interface on the near side. This was broken
when the directed broadcast code was removed from revision 1.32. If
the directed broadcast code was stil present, the destination would not
have been recognized as local until the packet was forwarded to the output
interface and ether_output() looped a copy back to ip_input() with
M_BCAST set and the receive interface set to the output interface.
Optimize the order of the tests.
Reviewed by: jlemon
if an arriving packet belongs to us, also check that the packet arrived
through the correct interface. Skip this check if the packet was locally
generated.
When we recieve a fragmented TCP packet (other than the first) we can't
extract header information (we don't have state to reference). In a rather
unelegant fashion we just move on and assume a non-match.
Recent additions to the TCP header-specific section of the code neglected
to add the logic to the fragment code so in those cases the match was
assumed to be positive and those parts of the rule (which should have
resulted in a non-match/continue) were instead skipped (which means
the processing of the rule continued even though it had already not
matched).
Fault can be spread out over Rich Steenbergen (tcpoptions) and myself
(tcp{seq,ack,win}).
rwatson sent me a patch that got me thinking about this whole situation
(but what I'm committing / this description is mine so don't blame him).
For TCP, verify that the sequence number in the ICMP packet falls within
the tcp receive window before performing any actions indicated by the
icmp packet.
Clean up some layering violations (access to tcp internals from in_pcb)
This piece of code has not been referenced since it was put there
in 1995. Also done a codebased search on popular networking libraries
and third-party applications. This is an orphan.
Reviewed by: jesper
connection, but send it immediately. Prior to this change, it was possible
to delay a delayed-ack for multiple times, resulting in degraded TCP
behavior in certain corner cases.
error will be passed up to the user, who will close the connection, so
it does not appear to make a sense to leave the connection open.
This also fixes a bug with kqueue, where the filter does not set EOF
on the connection, because the connection is still open.
Also remove calls to so{rw}wakeup, as we aren't doing anything with
them at the moment anyway.
Reviewed by: alfred, jesper
reset TCP connections which are in the SYN_SENT state, if the sequence
number in the echoed ICMP reply is correct. This behavior can be
controlled by the sysctl net.inet.tcp.icmp_may_rst.
Currently, only subtypes 2,3,10,11,12 are treated as such
(port, protocol and administrative unreachables).
Assocaiate an error code with these resets which is reported to the
user application: ENETRESET.
Disallow resetting TCP sessions which are not in a SYN_SENT state.
Reviewed by: jesper, -net
and 1.84 of src/sys/netinet/udp_usrreq.c
The changes broken down:
- remove 0 as a wildcard for addresses and port numbers in
src/sys/netinet/in_pcb.c:in_pcbnotify()
- add src/sys/netinet/in_pcb.c:in_pcbnotifyall() used to notify
all sessions with the specific remote address.
- change
- src/sys/netinet/udp_usrreq.c:udp_ctlinput()
- src/sys/netinet/tcp_subr.c:tcp_ctlinput()
to use in_pcbnotifyall() to notify multiple sessions, instead of
using in_pcbnotify() with 0 as src address and as port numbers.
- remove check for src port == 0 in
- src/sys/netinet/tcp_subr.c:tcp_ctlinput()
- src/sys/netinet/udp_usrreq.c:udp_ctlinput()
as they are no longer needed.
- move handling of redirects and host dead from in_pcbnotify() to
udp_ctlinput() and tcp_ctlinput(), so they will call
in_pcbnotifyall() to notify all sessions with the specific
remote address.
Approved by: jlemon
Inspired by: NetBSD
credential structure, ucred (cr->cr_prison).
o Allow jail inheritence to be a function of credential inheritence.
o Abstract prison structure reference counting behind pr_hold() and
pr_free(), invoked by the similarly named credential reference
management functions, removing this code from per-ABI fork/exit code.
o Modify various jail() functions to use struct ucred arguments instead
of struct proc arguments.
o Introduce jailed() function to determine if a credential is jailed,
rather than directly checking pointers all over the place.
o Convert PRISON_CHECK() macro to prison_check() function.
o Move jail() function prototypes to jail.h.
o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the
flag in the process flags field itself.
o Eliminate that "const" qualifier from suser/p_can/etc to reflect
mutex use.
Notes:
o Some further cleanup of the linux/jail code is still required.
o It's now possible to consider resolving some of the process vs
credential based permission checking confusion in the socket code.
o Mutex protection of struct prison is still not present, and is
required to protect the reference count plus some fields in the
structure.
Reviewed by: freebsd-arch
Obtained from: TrustedBSD Project
treat 0 as a wildcard in src/sys/in_pbc.c:in_pcbnotify()
It's sufficient to check for src|local port, as we'll have no
sessions with src|local port == 0
Without this a attacker sending ICMP messages, where the attached
IP header (+ 8 bytes) has the address and port numbers == 0, would
have the ICMP message applied to all sessions.
PR: kern/25195
Submitted by: originally by jesper, reimplimented by jlemon's advice
Reviewed by: jlemon
Approved by: jlemon
actually in the kernel. This structure is a different size than
what is currently in -CURRENT, but should hopefully be the last time
any application breakage is caused there. As soon as any major
inconveniences are removed, the definition of the in-kernel struct
ucred should be conditionalized upon defined(_KERNEL).
This also changes struct export_args to remove dependency on the
constantly-changing struct ucred, as well as limiting the bounds
of the size fields to the correct size. This means: a) mountd and
friends won't break all the time, b) mountd and friends won't crash
the kernel all the time if they don't know what they're doing wrt
actual struct export_args layout.
Reviewed by: bde
Add new PRC_UNREACH_ADMIN_PROHIB in sys/sys/protosw.h
Remove condition on TCP in src/sys/netinet/ip_icmp.c:icmp_input
In src/sys/netinet/ip_icmp.c:icmp_input set code = PRC_UNREACH_ADMIN_PROHIB
or PRC_UNREACH_HOST for all unreachables except ICMP_UNREACH_NEEDFRAG
Rename sysctl icmp_admin_prohib_like_rst to icmp_unreach_like_rst
to reflect the fact that we also react on ICMP unreachables that
are not administrative prohibited. Also update the comments to
reflect this.
In sys/netinet/tcp_subr.c:tcp_ctlinput add code to treat
PRC_UNREACH_ADMIN_PROHIB and PRC_UNREACH_HOST different.
PR: 23986
Submitted by: Jesper Skriver <jesper@skriver.dk>
address is configured on a interface. This is useful for routers with
dynamic interfaces. It is now possible to say:
0100 allow tcp from any to any established
0200 skipto 1000 tcp from any to any
0300 allow ip from any to any
1000 allow tcp from 1.2.3.4 to me 22
1010 deny tcp from any to me 22
1020 allow tcp from any to any
and not have to worry about the behaviour if dynamic interfaces configure
new IP numbers later on.
The check is semi expensive (traverses the interface address list)
so it should be protected as in the above example if high performance
is a requirement.
were performed to determine if the received packet should be reset. This
created erroneous ratelimiting and false alarms in some cases. The code
has now been reorganized so that the checks for validity come before
the call to badport_bandlim. Additionally, a few changes in the symbolic
names of the bandlim types have been made, as well as a clarification of
exactly which type each RST case falls under.
Submitted by: Mike Silbersack <silby@silby.com>
turned on, and the case of it not being defined at all.
i.e. Disabling bridging re-enables some of the checks it disables.
Submitted by: "Rogier R. Mulhuijzen" <drwilco@drwilco.net>
available, the error return should be EADDRNOTAVAIL rather than
EAGAIN.
PR: 14181
Submitted by: Dima Dorfman <dima@unixfreak.org>
Reviewed by: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
splimp() -- we need it because dummynet can be invoked by the
bridging code at splimp().
This should cure the pipe "stalls" that several people have been
reporting on -stable while using bridging+dummynet (the problem
would not affect routers using dummynet).
reserved and now allocated TCP flags in incoming packets. This patch
stops overloading those bits in the IP firewall rules, and moves
colliding flags to a seperate field, ipflg. The IPFW userland
management tool, ipfw(8), is updated to reflect this change. New TCP
flags related to ECN are now included in tcp.h for reference, although
we don't currently implement TCP+ECN.
o To use this fix without completely rebuilding, it is sufficient to copy
ip_fw.h and tcp.h into your appropriate include directory, then rebuild
the ipfw kernel module, and ipfw tool, and install both. Note that a
mismatch between module and userland tool will result in incorrect
installation of firewall rules that may have unexpected effects. This
is an MFC candidate, following shakedown. This bug does not appear
to affect ipfilter.
Reviewed by: security-officer, billf
Reported by: Aragon Gouveia <aragon@phat.za.net>
to supress logging when ARP replies arrive on the wrong interface:
"/kernel: arp: 1.2.3.4 is on dc0 but got reply from 00:00:c5:79:d0:0c on dc1"
the default is to log just to give notice about possibly incorrectly
configured networks.
This is because calls with M_WAIT (now M_TRYWAIT) may not wait
forever when nothing is available for allocation, and may end up
returning NULL. Hopefully we now communicate more of the right thing
to developers and make it very clear that it's necessary to check whether
calls with M_(TRY)WAIT also resulted in a failed allocation.
M_TRYWAIT basically means "try harder, block if necessary, but don't
necessarily wait forever." The time spent blocking is tunable with
the kern.ipc.mbuf_wait sysctl.
M_WAIT is now deprecated but still defined for the next little while.
* Fix a typo in a comment in mbuf.h
* Fix some code that was actually passing the mbuf subsystem's M_WAIT to
malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the
value of the M_WAIT flag, this could have became a big problem.
messages send by routers when they deny our traffic, this causes
a timeout when trying to connect to TCP ports/services on a remote
host, which is blocked by routers or firewalls.
rfc1122 (Requirements for Internet Hosts) section 3.2.2.1 actually
requi re that we treat such a message for a TCP session, that we
treat it like if we had recieved a RST.
quote begin.
A Destination Unreachable message that is received MUST be
reported to the transport layer. The transport layer SHOULD
use the information appropriately; for example, see Sections
4.1.3.3, 4.2.3.9, and 4.2.4 below. A transport protocol
that has its own mechanism for notifying the sender that a
port is unreachable (e.g., TCP, which sends RST segments)
MUST nevertheless accept an ICMP Port Unreachable for the
same purpose.
quote end.
I've written a small extension that implement this, it also create
a sysctl "net.inet.tcp.icmp_admin_prohib_like_rst" to control if
this new behaviour is activated.
When it's activated (set to 1) we'll treat a ICMP administratively
prohibited message (icmp type 3 code 9, 10 and 13) for a TCP
sessions, as if we recived a TCP RST, but only if the TCP session
is in SYN_SENT state.
The reason for only reacting when in SYN_SENT state, is that this
will solve the problem, and at the same time minimize the risk of
this being abused.
I suggest that we enable this new behaviour by default, but it
would be a change of current behaviour, so if people prefer to
leave it disabled by default, at least for now, this would be ok
for me, the attached diff actually have the sysctl set to 0 by
default.
PR: 23086
Submitted by: Jesper Skriver <jesper@skriver.dk>
1. ICMP ECHO and TSTAMP replies are now rate limited.
2. RSTs generated due to packets sent to open and unopen ports
are now limited by seperate counters.
3. Each rate limiting queue now has its own description, as
follows:
Limiting icmp unreach response from 439 to 200 packets per second
Limiting closed port RST response from 283 to 200 packets per second
Limiting open port RST response from 18724 to 200 packets per second
Limiting icmp ping response from 211 to 200 packets per second
Limiting icmp tstamp response from 394 to 200 packets per second
Submitted by: Mike Silbersack <silby@silby.com>
before adding/removing packets from the queue. Also, the if_obytes and
if_omcasts fields should only be manipulated under protection of the mutex.
IF_ENQUEUE, IF_PREPEND, and IF_DEQUEUE perform all necessary locking on
the queue. An IF_LOCK macro is provided, as well as the old (mutex-less)
versions of the macros in the form _IF_ENQUEUE, _IF_QFULL, for code which
needs them, but their use is discouraged.
Two new macros are introduced: IF_DRAIN() to drain a queue, and IF_HANDOFF,
which takes care of locking/enqueue, and also statistics updating/start
if necessary.
* Some dummynet code incorrectly handled a malloc()-allocated pseudo-mbuf
header structure, called "pkt," and could consequently pollute the mbuf
free list if it was ever passed to m_freem(). The fix involved passing not
pkt, but essentially pkt->m_next (which is a real mbuf) to the mbuf
utility routines.
* Also, for dummynet, in bdg_forward(), made the code copy the ethernet header
back into the mbuf (prepended) because the dummynet code that follows expects
it to be there but it is, unfortunately for dummynet, passed to bdg_forward
as a seperate argument.
PRs: kern/19551 ; misc/21534 ; kern/23010
Submitted by: Thomas Moestl <tmoestl@gmx.net>
Reviewed by: bmilekic
Approved by: luigi
instead.
Also, fix a small set of "avail." If we're setting `avail,' we shouldn't
be re-checking whether m_flags is M_EXT, because we know that it is, as if
it wasn't, we would have already returned several lines above.
Reviewed by: jlemon
only be checked if the system is currently performing New Reno style
fast recovery. However, this value was being checked regardless of the
NR state, with the end result being that the congestion window was never
opened.
Change the logic to check t_dupack instead; the only code path that
allows it to be nonzero at this point is NewReno, so if it is nonzero,
we are in fast recovery mode and should not touch the congestion window.
Tested by: phk
PPTP links are no longer dropped by simple (and inappropriate in this
case) "inactivity timeout" procedure, only when requested through the
control connection.
It is now possible to have multiple PPTP servers running behind NAT.
Just redirect the incoming TCP traffic to port 1723, everything else
is done transparently.
Problems were reported and the fix was tested by:
Michael Adler <Michael.Adler@compaq.com>,
David Andersen <dga@lcs.mit.edu>
<sys/proc.h> to <sys/systm.h>.
Correctly document the #includes needed in the manpage.
Add one now needed #include of <sys/systm.h>.
Remove the consequent 48 unused #includes of <sys/proc.h>.
because it only takes a struct tag which makes it impossible to
use unions, typedefs etc.
Define __offsetof() in <machine/ansi.h>
Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h>
Remove myriad of local offsetof() definitions.
Remove includes of <stddef.h> in kernel code.
NB: Kernelcode should *never* include from /usr/include !
Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API.
Deprecate <struct.h> with a warning. The warning turns into an error on
01-12-2000 and the file gets removed entirely on 01-01-2001.
Paritials reviews by: various.
Significant brucifications by: bde
of IP datagram. This fixes the problem when firewall denied fragmented
packets whose last fragment was less than minimum protocol header size.
Found by: Harti Brandt <brandt@fokus.gmd.de>
PR: kern/22309
in the code enforces this. So, do not check for and attempt a
false reassembly if only IP_RF is set.
Also, removed the dead code, since we no longer use dtom() on
return from ip_reass().
statistics on a per network address basis.
Teach the IPv4 and IPv6 input/output routines to log packets/bytes
against the network address connected to the flow.
Teach netstat to display the per-address stats for IP protocols
when 'netstat -i' is evoked, instead of displaying the per-interface
stats.
and instead reapply the revision 1.49 of mbuf.h, i.e.
Fixed regression of the type of the `header' member of struct pkthdr from
`void *' to caddr_t in rev.1.51. This mainly caused an annoying warning
for compiling ip_input.c.
Requested by: bde
enough into the mbuf data area. Solve this problem once and for all
by pulling up the entire (standard) header for TCP and UDP, and four
bytes of header for ICMP (enough for type, code and cksum fields).
the IP_FW_IF_IPID rule. (We have recently decided to keep the
ip_id field in network byte order inside the kernel, see revision
1.140 of src/sys/netinet/ip_input.c).
I did not like to have the conversion happen in userland, and I
think that the similar conversions for fw_tcp(seq|ack|win) should
be moved out of userland (src/sbin/ipfw/ipfw.c) into the kernel.
in favor of the new-style per-vif socket.
this does not affect the behavior of the ISI rsvpd but allows
another rsvp implementation (e.g., KOM rsvp) to take advantage
of the new style for particular sockets while using the old style
for others.
in the future, rsvp supporn should be replaced by more generic
router-alert support.
PR: kern/20984
Submitted by: Martin Karsten <Martin.Karsten@KOM.tu-darmstadt.de>
Reviewed by: kjc
but have a network interrupt arrive and deactivate the timeout before
the callout routine runs. Check for this case in the callout routine;
it should only run if the callout is active and not on the wheel.
It causes a panic when/if snd_una is incremented elsewhere (this
is a conservative change, because originally no rollback occurred
for any packets at all).
Submitted by: Vivek Sadananda Pai <vivek@imimic.com>
Update copyrights.
Introduce a new sysctl node:
net.inet.accf
Although acceptfilters need refcounting to be properly (safely) unloaded
as a temporary hack allow them to be unloaded if the sysctl
net.inet.accf.unloadable is set, this is really for developers who want
to work on thier own filters.
A near complete re-write of the accf_http filter:
1) Parse check if the request is HTTP/1.0 or HTTP/1.1 if not dump
to the application.
Because of the performance implications of this there is a sysctl
'net.inet.accf.http.parsehttpversion' that when set to non-zero
parses the HTTP version.
The default is to parse the version.
2) Check if a socket has filled and dump to the listener
3) optimize the way that mbuf boundries are handled using some voodoo
4) even though you'd expect accept filters to only be used on TCP
connections that don't use m_nextpkt I've fixed the accept filter
for socket connections that use this.
This rewrite of accf_http should allow someone to use them and maintain
full HTTP compliance as long as net.inet.accf.http.parsehttpversion is
set.
for them does not belong in the IP_FW_F_COMMAND switch, that mask doesn't even
apply to them(!).
2. You cannot add a uid/gid rule to something that isn't TCP, UDP, or IP.
XXX - this should be handled in ipfw(8) as well (for more diagnostic output),
but this at least protects bogus rules from being added.
Pointy hat: green
datagram embedded into ICMP error message, not with protocol
field of ICMP message itself (which is always IPPROTO_ICMP).
Pointed by: Erik Salander <erik@whistle.com>
fields between host and network byte order. The details:
o icmp_error() now does not add IP header length. This fixes the problem
when icmp_error() is called from ip_forward(). In this case the ip_len
of the original IP datagram returned with ICMP error was wrong.
o icmp_error() expects all three fields, ip_len, ip_id and ip_off in host
byte order, so DTRT and convert these fields back to network byte order
before sending a message. This fixes the problem described in PR 16240
and PR 20877 (ip_id field was returned in host byte order).
o ip_ttl decrement operation in ip_forward() was moved down to make sure
that it does not corrupt the copy of original IP datagram passed later
to icmp_error().
o A copy of original IP datagram in ip_forward() was made a read-write,
independent copy. This fixes the problem I first reported to Garrett
Wollman and Bill Fenner and later put in audit trail of PR 16240:
ip_output() (not always) converts fields of original datagram to network
byte order, but because copy (mcopy) and its original (m) most likely
share the same mbuf cluster, ip_output()'s manipulations on original
also corrupted the copy.
o ip_output() now expects all three fields, ip_len, ip_off and (what is
significant) ip_id in host byte order. It was a headache for years that
ip_id was handled differently. The only compatibility issue here is the
raw IP socket interface with IP_HDRINCL socket option set and a non-zero
ip_id field, but ip.4 manual page was unclear on whether in this case
ip_id field should be in host or network byte order.
not alias `ip_src' unless it comes from the host an original
datagram that triggered this error message was destined for.
PR: 20712
Reviewed by: brian, Charles Mott <cmott@scientech.com>
When this happens, we know for sure that the packet data was not
received by the peer. Therefore, back out any advancing of the
transmit sequence number so that we send the same data the next
time we transmit a packet, avoiding a guaranteed missed packet and
its resulting TCP transmit slowdown.
In most systems ip_output() probably never returns an error, and
so this problem is never seen. However, it is more likely to occur
with device drivers having short output queues (causing ENOBUFS to
be returned when they are full), not to mention low memory situations.
Moreover, because of this problem writers of slow devices were
required to make an unfortunate choice between (a) having a relatively
short output queue (with low latency but low TCP bandwidth because
of this problem) or (b) a long output queue (with high latency and
high TCP bandwidth). In my particular application (ISDN) it took
an output queue equal to ~5 seconds of transmission to avoid ENOBUFS.
A more reasonable output queue of 0.5 seconds resulted in only about
50% TCP throughput. With this patch full throughput was restored in
the latter case.
Reviewed by: freebsd-net
delete the cloned route that is associated with the connection.
This does not exhaust the routing table memory when the system
is under a SYN flood attack. The route entry is not deleted if there
is any prior information cached in it.
Reviewed by: Peter Wemm,asmodai
reply if the requesting machine isn't on the interface we believe
it should be. Prevents arp wars when you plug cables in the wrong
way around.
PR: 9848
Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
Not objected to by: wollman
- Multiple PPTP clients behind NAT to the same or different servers.
- Single PPTP server behind NAT -- you just need to redirect TCP
port 1723 to a local machine. Multiple servers behind NAT is
possible but would require a simple API change.
- No API changes!
For more information on how this works see comments at the start of
the alias_pptp.c.
PacketAliasPptp() is no longer necessary and will be removed soon.
Submitted by: Erik Salander <erik@whistle.com>
Reviewed by: ru
Rewritten by: ru
Reviewed by: Erik Salander <erik@whistle.com>
accept filters are now loadable as well as able to be compiled into
the kernel.
two accept filters are provided, one that returns sockets when data
arrives the other when an http request is completed (doesn't work
with 0.9 requests)
Reviewed by: jmg
It does mean that it is now possible to run passive-mode FTP
server behind NAT.
- SECURITY: FTP aliasing engine now ensures that:
o the segment preceding a PORT/227 segment terminates with a \r\n;
o the IP address in the PORT/227 matches the source IP address of
the packet;
o the port number in the PORT command or 277 reply is greater than
or equal to 1024.
Submitted by: Erik Salander <erik@whistle.com>
Reviewed by: ru
It also squashes 99% of packet kiddie synflood orgies. For example, to
rate syn packets without MSS,
ipfw pipe 10 config 56Kbit/s queue 10Packets
ipfw add pipe 10 tcp from any to any in setup tcpoptions !mss
Submitted by: Richard A. Steenbergen <ras@e-gerbil.net>
a mbuf, it may return without setting any timers. If no more data is
scheduled to be transmitted (this was a FIN) the system will sit in
LAST_ACK state forever.
Thus, when mbuf allocation fails, set the retransmit timer if neither
the retransmit or persist timer is already pending.
Problem discovered by: Mike Silbersack (silby@silby.com)
Pushed for a fix by: Bosko Milekic <bmilekic@dsuper.net>
Reviewed by: jayanth
down as a result of a reset. Returning EINVAL in that case makes no
sense at all and just confuses people as to what happened. It could be
argued that we should save the original address somewhere so that
getsockname() etc can tell us what it used to be so we know where the
problem connection attempts are coming from.
integer expression. Otherwise the sizeof() call will force the expression
to be evaluated as unsigned, which is not the intended behavior.
Obtained from: NetBSD (in a different form)
code retransmitting data from the wrong offset.
As a footnote, the newreno code was partially derived from NetBSD
and Tom Henderson <tomh@cs.berkeley.edu>
of the individual drivers and into the common routine ether_input().
Also, remove the (incomplete) hack for matching ethernet headers
in the ip_fw code.
The good news: net result of 1016 lines removed, and this should make
bridging now work with *all* Ethernet drivers.
The bad news: it's nearly impossible to test every driver, especially
for bridging, and I was unable to get much testing help on the mailing
lists.
Reviewed by: freebsd-net
better recovery for multiple packet losses in a single window.
The algorithm can be toggled via the sysctl net.inet.tcp.newreno,
which defaults to "on".
Submitted by: Jayanth Vijayaraghavan <jayanth@yahoo-inc.com>
calling in_pcbbind so that in_pcbbind sees a valid address if no
address was specified (since divert sockets ignore them).
PR: 17552
Reviewed by: Brian
to PPTP) with more generic PacketAliasRedirectProto().
Major number is not bumped because it is believed that noone
has started using PacketAliasRedirectPptp() yet.
LSNAT links are first created by either PacketAliasRedirectPort() or
PacketAliasRedirectAddress() and then set up by one or more calls to
PacketAliasAddServer().
Without this fix, all IPv6 TCP RST packet has wrong cksum value,
so IPv6 connect() trial to 5.0 machine won't fail until tcp connect timeout,
when they should fail soon.
Thanks to haro@tk.kubota.co.jp (Munehiro Matsuda) for his much debugging
help and detailed info.
connections, after SYN packets were seen from both ends. Before this,
it would get applied right after the first SYN packet was seen (either
from client or server). With broken TCP connection attempts, when the
remote end does not respond with SYNACK nor with RST, this resulted in
having a useless (ie, no actual TCP connection associated with it) TCP
link with 86400 seconds TTL, wasting system memory. With high rate of
such broken connection attempts (for example, remote end simply blocks
these connection attempts with ipfw(8) without sending RST back), this
could result in a denial-of-service.
PR: bin/17963
but with `dst_port' work for outgoing packets.
This case was not handled properly when I first fixed this
in revision 1.17.
This change is also required for the upcoming improved PPTP
support patches -- that is how I found the problem.
Before this change:
# natd -v -a aliasIP \
-redirect_port tcp localIP:localPORT publicIP:publicPORT 0:remotePORT
Out [TCP] [TCP] localIP:localPORT -> remoteIP:remotePORT aliased to
[TCP] aliasIP:localPORT -> remoteIP:remotePORT
After this change:
# natd -v -a aliasIP \
-redirect_port tcp localIP:localPORT publicIP:publicPORT 0:remotePORT
Out [TCP] [TCP] localIP:localPORT -> remoteIP:remotePORT aliased to
[TCP] publicIP:publicPORT -> remoteIP:remotePORT
INADDR_NONE: Incoming packets go to the alias address (the default)
INADDR_ANY: Incoming packets are not NAT'd (direct access to the
internal network from outside)
anything else: Incoming packets go to the specified address
Change a few inaddr::s_addr == 0 to inaddr::s_addr == INADDR_ANY
while I'm there.
redirected and when no target address has been specified, NAT
the destination address to the alias address rather than
allowing people direct access to your internal network from
outside.
mbuf is marked for delayed checksums, then additionally mark the
packet as having it's checksums computed. This allows us to bypass
computing/checking the checksum entirely, which isn't really needeed
as the packet has never hit the wire.
Reviewed by: green
Reported in Usenet by: locke@mcs.net (Peter Johnson)
While i was at it, prepended a 0x to the %D output, to make it clear that
the printed value is in hex (i assume %D has been chosen over %#x to
obey network byte order).
improperly doing the equivalent of (m = (function() == NULL)) instead
of ((m = function()) == NULL).
This fixes a NULL pointer dereference panic with runt arp packets.
Remove a bogus (redundant, just weird, etc.) key_freeso(so).
There are no consumers of it now, nor does it seem there
ever will be.
in6?_pcb.c:
Add an if (inp->in6?p_sp != NULL) before the call to
ipsec[46]_delete_pcbpolicy(inp). In low-memory conditions
this can cause a crash because in6?_sp can be NULL...
from iso88025.h.
o Add minimal llc support to iso88025_input.
o Clean up most of the source routing code.
* Submitted by: Nikolai Saoukh <nms@otdel-1.org>
Now most big problem of IPv6 is getting IPv6 address
assignment.
6to4 solve the problem. 6to4 addr is defined like below,
2002: 4byte v4 addr : 2byte SLA ID : 8byte interface ID
The most important point of the address format is that an IPv4 addr
is embeded in it. So any user who has IPv4 addr can get IPv6 address
block with 2byte subnet space. Also, the IPv4 addr is used for
semi-automatic IPv6 over IPv4 tunneling.
With 6to4, getting IPv6 addr become dramatically easy.
The attached patch enable 6to4 extension, and confirmed to work,
between "Richard Seaman, Jr." <dick@tar.com> and me.
Approved by: jkh
Reviewed by: itojun
ARP packets. This can incorrectly reject complete frames since the frame
could be stored in more than one mbuf.
The following patches fix the length comparisson, and add several
diagnostic log messages to the interrupt handler for out-of-the-norm ARP
packets. This should make ARP problems easier to detect, diagnose and
fix.
Submitted by: C. Stephen Gunn <csg@waterspout.com>
Approved by: jkh
Reviewed by: rwatson
Without this, kernel will panic at getsockopt() of IPSEC_POLICY.
Also make compilable libipsec/test-policy.c which tries getsockopt() of
IPSEC_POLICY.
Approved by: jkh
Submitted by: sakane@kame.net
KAME put INET6 related stuff into sys/netinet6 dir, but IPv6
standard API(RFC2553) require following files to be under sys/netinet.
netinet/ip6.h
netinet/icmp6.h
Now those header files just include each following files.
netinet6/ip6.h
netinet6/icmp6.h
Also KAME has netinet6/in6.h for easy INET6 common defs
sharing between different BSDs, but RFC2553 requires only
netinet/in.h should be included from userland.
So netinet/in.h also includes netinet6/in6.h inside.
To keep apps portability, apps should not directly include
above files from netinet6 dir.
Ideally, all contents of,
netinet6/ip6.h
netinet6/icmp6.h
netinet6/in6.h
should be moved into
netinet/ip6.h
netinet/icmp6.h
netinet/in.h
but to avoid big changes in this stage, add some hack, that
-Put some special macro define into those files under neitnet
-Let files under netinet6 cause error if it is included
from some apps, and, if the specifal macro define is not
defined.
(which should have been defined if files under netinet is
included)
-And let them print an error message which tells the
correct name of the include file to be included.
Also fix apps which includes invalid header files.
Approved by: jkh
Obtained from: KAME project
at the same time.
When rfc1323 and rfc1644 option are enabled by sysctl,
and tcp over IPv6 is tried, kernel panic happens by the
following check in tcp_output(), because now hdrlen is bigger
in such case than before.
/*#ifdef DIAGNOSTIC*/
if (max_linkhdr + hdrlen > MHLEN)
panic("tcphdr too big");
/*#endif*/
So change the above check to compare with MCLBYTES in #ifdef INET6 case.
Also, allocate a mbuf cluster for the header mbuf, in that case.
Bug reported at KAME environment.
Approved by: jkh
Reviewed by: sumikawa
Obtained from: KAME project
This is fix to usr.sbin/trpt and tcp_debug.[ch]
I think of putting this after 4.0 but,,,
-There was bug that when INET6 is defined,
IPv4 socket is not traced by trpt.
-I received request from a person who distribute a program
which use tcp_debug interface and print performance statistics,
that
-leave comptibility with old program as much as possible
-use same interface with other OSes
So, I talked with itojun, and synced API with netbsd IPv6 extension.
makeworld check, kernel build check(includes GENERIC) is done.
But if there happen to any problem, please let me know and
I soon backout this change.
o Drop all broadcast and multicast source addresses in tcp_input.
o Enable ICMP_BANDLIM in GENERIC.
o Change default to 200/s from 100/s. This will still stop the attack, but
is conservative enough to do this close to code freeze.
This is not the optimal patch for the problem, but is likely the least
intrusive patch that can be made for this.
Obtained from: Don Lewis and Matt Dillon.
Reviewed by: freebsd-security
for IPv4 communication.(IPv4 mapped IPv6 addr.)
Also removed IPv6 hoplimit initialization because it is alway done at
tcp_output.
Confirmed by: Bernd Walter <ticso@cicely5.cicely.de>
include this in all kernels. Declare some const *intrq_present
variables that can be checked by a module prior to using *intrq
to queue data.
Make the if_tun module capable of processing atm, ip, ip6, ipx,
natm and netatalk packets when TUNSIFHEAD is ioctl()d on.
Review not required by: freebsd-hackers
-opt_ipsec.h was missing on some tcp files (sorry for basic mistake)
-made buildable as above fix
-also added some missing IPv4 mapped IPv6 addr consideration into
ipsec4_getpolicybysock
This must be one of the reason why connections over IPsec hangs for
bigger packets.(which was reported on freebsd-current@freebsd.org)
But there still seems to be another bug and the problem is not yet fixed.
is very likely to become consensus as recent ietf/ipng mailing list
discussion. Also recent KAME repository and other KAME patched BSDs
also applied it.
s/__ss_family/ss_family/
s/__ss_len/ss_len/
Makeworld is confirmed, and no application should be affected by this change
yet.
now you can dynamically create rate-limited queues for different
flows using masks on dst/src IP, port and protocols.
Read the ipfw(8) manpage for details and examples.
Restructure the internals of the traffic shaper to use heaps,
so that it manages efficiently large number of queues.
Fix a bug which was present in the previous versions which could
cause, under certain unfrequent conditions, to send out very large
bursts of traffic.
All in all, this new code is much cleaner than the previous one and
should also perform better.
Work supported by Akamba Corp.
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.
desperation measure in low-memory situations), walk the tcpbs and
flush the reassembly queues.
This behaviour is currently controlled by the debug.do_tcpdrain sysctl
(defaults to on).
Submitted by: Bosko Milekic <bmilekic@dsuper.net>
Reviewed by: wollman
pr_input() routines prototype is also changed to support IPSEC and IPV6
chained protocol headers.
Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project
to print out protocol specific pcb info.
A patch submitted by guido@gvr.org, and asmodai@wxs.nl also reported
the problem.
Thanks and sorry for your troubles.
Submitted by: guido@gvr.org
Reviewed by: shin
packet divert at kernel for IPv6/IPv4 translater daemon
This includes queue related patch submitted by jburkhol@home.com.
Submitted by: queue related patch from jburkhol@home.com
Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project
the old one: an unnecessary define (KLD_MODULE) has been deleted and
the initialisation of the module is done after domaininit was called
to be sure inet is running.
Some slight changed were made to ip_auth.c and ip_state.c in order
to assure including of sys/systm.h in case we make a kld
Make sure ip_fil does nmot include osreldate in kernel mode
Remove mlfk_ipl.c from here: no sources allowed in these directories!
- Implement 'ipfw tee' (finally)
- Divert packets by calling new function divert_packet() directly instead
of going through protosw[].
- Replace kludgey global variable 'ip_divert_port' with a function parameter
to divert_packet()
- Replace kludgey global variable 'frag_divert_port' with a function parameter
to ip_reass()
- style(9) fixes
Reviewed by: julian, green
This results in closer behavior to earlier versions, where the fixed
200ms timer actually resulted in a delay anywhere from 1..200ms, with
the average delay being 100ms.
Pointed out by: dg
for IPv6 yet)
With this patch, you can assigne IPv6 addr automatically, and can reply to
IPv6 ping.
Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project
to be dangerous. It will better serve us as a port building a KLD,
ala SKIP.
The hooks are staying although it would be better to port and use
the NetBSD pfil interface rather than have custom hooks.
the link are equal to the default aliasing address. Do not zero them!
This will fix the problem with non-working links added with the source
and/or aliasing address equal to the default aliasing address, but the
default aliasing address is set later, after the link has been set up,
like both natd(8) and ppp(8) do (for objective reasons).
Reviewed by: Brian Somers <brian@FreeBSD.org>,
Eivind Eklund <eivind@FreeBSD.org>,
Charles Mott <cmott@srv.net>
have been there in the first place. A GENERIC kernel shrinks almost 1k.
Add a slightly different safetybelt under nostop for tty drivers.
Add some missing FreeBSD tags
`dst_port') work for outgoing packets.
- Make permanent links whose `alias_addr' matches the primary aliasing
address `aliasAddress' work for incoming packets.
- Typo fixes.
Reviewed by: brian, eivind
Make a sonewconn3() which takes an extra argument (proc) so new sockets created
with sonewconn() from a user's system call get the correct credentials, not
just the parent's credentials.
In the words of originator:
:If an incoming connection is initiated through natd and deny_incoming is
:not set, then a new alias_link structure is created to handle the link.
:If there is nothing listening for the incoming connection, then the kernel
:responds with a RST for the connection. However, this is not processed
:correctly in libalias/alias.c:TcpMonitor{In,Out} and
:libalias/alias_db.c:SetState{In,Out} as it thinks a connection
:has been established and therefore applies a timeout of 86400 seconds
:to the link.
:
:If many of these half-connections are initiated (during, for example, a
:port scan of the host), then many thousands of unnecessary links are
:created and the resident size of natd balloons to 20MB or more.
PR: 13639
Reviewed by: brian
- eliminate the fast/slow timeout lists for TCP and instead use a
callout entry for each timer.
- increase the TCP timer granularity to HZ
- implement "bad retransmit" recovery, as presented in
"On Estimating End-to-End Network Path Properties", by Allman and Paxson.
Submitted by: jlemon, wollmann
using syslog(3) (log(9)) for its various purposes! This long-awaited
change also includes such nice things as:
* macros expanding into _two_ comma-delimited arguments!
* snprintf!
* more snprintf!
* linting and criticism by more people than you can shake a stick at!
* a slightly more uniform message style than before!
and last but not least
* no less than 5 rewrites!
Reviewed by: committers
drop any segment arriving at a closed port.
tcp.blackhole=1 - only drop SYN without RST
tcp.blackhole=2 - drop everything without RST
tcp.blackhole=0 - always send RST - default behaviour
This confuses nmap -sF or -sX or -sN quite badly.
sysctl knobs.
With these knobs on, refused connection attempts are dropped
without sending a RST, or Port unreachable in the UDP case.
In the TCP case, sending of RST is inhibited iff the incoming
segment was a SYN.
Docs and rc.conf settings to follow.
- Sort xrefs
- FreeBSD.ORG -> FreeBSD.org
- Be consistent with section names as outlines in mdoc(7)
- Other misc mdoc cleanup.
PR: doc/13144
Submitted by: Alexy M. Zelkin <phantom@cris.net>
with a match probability to achieve non-deterministic behaviour of
the firewall. This can be extremely useful for testing purposes
such as simulating random packet drop without having to use dummynet
(which already does the same thing), and simulating multipath effects
and the associated out-of-order delivery (this time in conjunction
with dummynet).
The overhead on normal rules is just one comparison with 0.
Since it would have been trivial to implement this by just adding
a field to the ip_fw structure, I decided to do it in a
backward-compatible way (i.e. struct ip_fw is unchanged, and as a
consequence you don't need to recompile ipfw if you don't want to
use this feature), since this was also useful for -STABLE.
When, at some point, someone decides to change struct ip_fw, please
add a length field and a version number at the beginning, so userland
apps can keep working even if they are out of sync with the kernel.
respectively logging and dropping ICMP REDIRECT packets.
Note that there is no rate limiting on the log messages, so log_redirect
should be used with caution (preferrably only for debugging purposes).
_or_ you may specify "log logamount number" to set logging specifically
the rule.
In addition, "ipfw resetlog" has been added, which will reset the
logging counters on any/all rule(s). ipfw resetlog does not affect
the packet/byte counters (as ipfw reset does), and is the only "set"
command that can be run at securelevel >= 3.
This should address complaints about not being able to set logging
amounts, not being able to restart logging at a high securelevel,
and not being able to just reset logging without resetting all of the
counters in a rule.
would make a difference. However, my previous diff _did_ change the
behavior in some way (not necessarily break it), so I'm fixing it.
Found by: bde
Submitted by: bde
exit on errors.
If we don't, in_pcbrehash() is called without a preceeding
in_pcbinshash(), causing a crash.
There are apparently several conditions that could cause the crash;
PR misc/12256 is only one of these.
PR: misc/12256
This is the change to struct sockets that gets rid of so_uid and replaces
it with a much more useful struct pcred *so_cred. This is here to be able
to do socket-level credential checks (i.e. IPFW uid/gid support, to be added
to HEAD soon). Along with this comes an update to pidentd which greatly
simplifies the code necessary to get a uid from a socket. Soon to come:
a sysctl() interface to finding individual sockets' credentials.
to either enqueue or free their mbuf chains, but tcp_usr_send() was
dropping them on the floor if the tcpcb/inpcb has been torn down in the
middle of a send/write attempt. This has been responsible for a wide
variety of mbuf leak patterns, ranging from slow gradual leakage to rather
rapid exhaustion. This has been a problem since before 2.2 was branched
and appears to have been fixed in rev 1.16 and lost in 1.23/1.28.
Thanks to Jayanth Vijayaraghavan <jayanth@yahoo-inc.com> for checking
(extensively) into this on a live production 2.2.x system and that it
was the actual cause of the leak and looks like it fixes it. The machine
in question was loosing (from memory) about 150 mbufs per hour under
load and a change similar to this stopped it. (Don't blame Jayanth
for this patch though)
An alternative approach to this would be to recheck SS_CANTSENDMORE etc
inside the splnet() right before calling pru_send() after all the potential
sleeps, interrupts and delays have happened. However, this would mean
exposing knowledge of the tcp stack's reset handling and removal of the
pcb to the generic code. There are other things that call pru_send()
directly though.
Problem originally noted by: John Plevyak <jplevyak@inktomi.com>
The cdevsw_add() function now finds the major number(s) in the
struct cdevsw passed to it. cdevsw_add_generic() is no longer
needed, cdevsw_add() does the same thing.
cdevsw_add() will print an message if the d_maj field looks bogus.
Remove nblkdev and nchrdev variables. Most places they were used
bogusly. Instead check a dev_t for validity by seeing if devsw()
or bdevsw() returns NULL.
Move bdevsw() and devsw() functions to kern/kern_conf.c
Bump __FreeBSD_version to 400006
This commit removes:
72 bogus makedev() calls
26 bogus SYSINIT functions
if_xe.c bogusly accessed cdevsw[], author/maintainer please fix.
I4b and vinum not changed. Patches emailed to authors. LINT
probably broken until they catch up.
Reformat and initialize correctly all "struct cdevsw".
Initialize the d_maj and d_bmaj fields.
The d_reset field was not removed, although it is never used.
I used a program to do most of this, so all the files now use the
same consistent format. Please keep it that way.
Vinum and i4b not modified, patches emailed to respective authors.
(default 1) disables PMTUD globally. Although PMTUD can be disabled in
the standard case by locking the MTU on a static route (including the
default route), this method doesn't work in the face of dynamic routing
protocols like gated.
+ add a missing call to dn_rule_delete() when flushing firewall
rules, thus preventing possible panics due to dangling pointers
(this was already done for single rule deletes).
+ improve "usage" output in ipfw(8)
+ add a few checks to ipfw pipe parameters and make it a bit more
tolerant of common mistakes (such as specifying kbit instead of Kbit)
PR: kern/10889
Submitted by: Ruslan Ermilov
routines. The descriptor contains parameters which could be used
within those routines (eg. ip_output() ).
On passing, add IPPROTO_PGM entry to netinet/in.h
+ plug an mbuf leak when dummynet used with bridging
+ make prototype of dummynet_io consistent with usage
+ code cleanup so that now bandwidth regulation is precise to the
bit/s and not to (8*HZ) bit/s as before.
This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.
For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".
Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.
Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.
It generally does what one would expect, but setting up a jail
still takes a little knowledge.
A few notes:
I have no scripts for setting up a jail, don't ask me for them.
The IP number should be an alias on one of the interfaces.
mount a /proc in each jail, it will make ps more useable.
/proc/<pid>/status tells the hostname of the prison for
jailed processes.
Quotas are only sensible if you have a mountpoint per prison.
There are no privisions for stopping resource-hogging.
Some "#ifdef INET" and similar may be missing (send patches!)
If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!
Tools, comments, patches & documentation most welcome.
Have fun...
Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/
1:
s/suser/suser_xxx/
2:
Add new function: suser(struct proc *), prototyped in <sys/proc.h>.
3:
s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/
The remaining suser_xxx() calls will be scrutinized and dealt with
later.
There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.
More changes to the suser() API will come along with the "jail" code.
- unifdef -DCOMPAT_IPFW (this was on by default already)
- remove traces of in-kernel ip_nat package, it was never committed.
- Make IPFW and DUMMYNET initialize themselves rather than depend on
compiled-in hooks in ip_init(). This means they initialize the same
way both in-kernel and as kld modules. (IPFW initializes now :-)
the IP header (this would not work for bridged packets).
This has been fixed long ago in the 2.2 branch.
Problem noticed by: a few people
Fix suggested by: Remy Nonnenmacher
also rely less on other modules clearing static values, and clear them
in a few cases we missed before.
Submitted by: Matthew Reimer <mreimer@vpop.net>
Move the Olicom token ring driver to the officially sanctionned location of
/sys/contrib. Also fix some brokenness in the generic token ring support.
Be warned that if_dl.h has been changed and SOME programs might
like recompilation.
- Transparent proxying support added.
- PPTP redirecting support added based on patches
contributed by Dru Nelson <dnelson@redwoodsoft.com>.
Submitted by: Charles Mott <cmott@srv.net>
their ttl). This can be used - in combination with the proper ipfw
incantations - to make a firewall or router invisible to traceroute
and other exploration tools.
This behaviour is controlled by a sysctl variable (net.inet.ip.stealth)
and hidden behind a kernel option (IPSTEALTH).
Reviewed by: eivind, bde
This is for various Olicom cards. An IBM driver is following.
This patch also adds support to tcpdump to decode packets on tokenring.
Congratulations to the proud father.. (below)
Submitted by: Larry Lile <lile@stdio.com>
This makes it possible to change the sysctl tree at runtime.
* Change KLD to find and register any sysctl nodes contained in the loaded
file and to unregister them when the file is unloaded.
Reviewed by: Archie Cobbs <archie@whistle.com>,
Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)
convince myself that nothing will break if we permit IP input while
interface addresses are unconfigured. (At worst, they will hit some
ULP's PCB scan and fail if nobody is listening.) So, remove the restriction
that addresses must be configured before packets can be input. Assume
that any unicast packet we receive while unconfigured is potentially ours.
Divert was not feeding clean data to ifa_ifwithaddr() so it was
giving bad results.
Submitted by: kseel <kseel@utcorp.com>, Ruslan Ermilov <ru@ucb.crimea.ua>
This was missed in the 4.4-Lite2 merge.
Noticed by: Mohan Parthasarathy <Mohan.Parthasarathy@eng.Sun.COM> and
jayanth@loc201.tandem.com (vijayaraghavan_jayanth)
on the tcp-impl mailing list.
flag means that there is more data to be put into the socket buffer.
Use it in TCP to reduce the interaction between mbuf sizes and the
Nagle algorithm.
Based on: "Justin C. Walker" <justin@apple.com>'s description of Apple's
fix for this problem.
state.
Note: this requires a recompilation of netstat (but netstat has been
broken since rev 1.52 of ip_mroute.c anyway)
Obtained from: Significantly based on Steve McCanne's
<mccanne@cs.berkeley.edu> work for BSD/OS
arplookup() to try again. This gets rid of at least one user's
"arpresolve: can't allocate llinfo" errors, and arplookup() gives
better error messages to help track down the problem if there really
is a problem with the routing table.
problems reported recently (the rtentry pointer in the dummynet
queue was not initialized in all cases, resulting in spurious
rt_refcnt decreases in the lucky cases, and memory trashing in
other cases.
have all fields in network order, whereas ipfw expects some to be
in host order. This resulted in some incorrect matching, e.g. some
packets being identified as fragments, or bandwidth not being
correctly enforced.
NOTE: this only affects bridge+ipfw, normal ipfw usage was already
correct).
Reported-By: Dave Alden and others.
Add bounds checking to netbios NS packet resolving code. This should
prevent natd from crashing on badly formed netbios packets (as might be
heard when the machine is sitting on a cable modem or certain DSL
networks), and also closes potential security holes that might have
exploited the lack of bounds checking in the previous version of the
code.
If timer calculation results in degenerate value (0), force it to 1
to avoid divide-by-zero panic later on in calls to IGMP_RANDOM_DELAY().
I considered simply adding 1 to the timer calculation, but was unsure
if the calculation was part of the IGMP standard or not so did not want
to mess with it for all cases.
for possible buffer overflow problems. Replaced most sprintf()'s
with snprintf(); for others cases, added terminating NUL bytes where
appropriate, replaced constants like "16" with sizeof(), etc.
These changes include several bug fixes, but most changes are for
maintainability's sake. Any instance where it wasn't "immediately
obvious" that a buffer overflow could not occur was made safer.
Reviewed by: Bruce Evans <bde@zeta.org.au>
Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by: Mike Spengler <mks@networkcs.com>
option not defined the sysctl int value is set to -1 and read-only.
#ifdef KERNEL's added appropriately to wall off visibility of kernel
routines from user code.
Add ICMP_BANDLIM option and 'net.inet.icmp.icmplim' sysctl. If option
is specified in kernel config, icmplim defaults to 100 pps. Setting it
to 0 will disable the feature. This feature limits ICMP error responses
for packets sent to bad tcp or udp ports, which does a lot to help the
machine handle network D.O.S. attacks.
The kernel will report packet rates that exceed the limit at a rate of
one kernel printf per second. There is one issue in regards to the
'tail end' of an attack... the kernel will not output the last report
until some unrelated and valid icmp error packet is return at some
point after the attack is over. This is a minor reporting issue only.
when a TCP "stealth" scan is directed at a *BSD box by ensuring the window
is 0 for all RST packets generated through tcp_respond()
Reviewed by: Don Lewis <Don.Lewis@tsc.tdk.com>
Obtained from: Bugtraq (from: Darren Reed <avalon@COOMBS.ANU.EDU.AU>)
This is the bulk of the support for doing kld modules. Two linker_sets
were replaced by SYSINIT()'s. VFS's and exec handlers are self registered.
kld is now a superset of lkm. I have converted most of them, they will
follow as a seperate commit as samples.
This all still works as a static a.out kernel using LKM's.
- Don't bother checking for conflicting sockets if we're binding to a
multicast address.
- Don't return an error if we're binding to INADDR_ANY, the conflicting
socket is bound to INADDR_ANY, and the conflicting socket has SO_REUSEPORT
set.
PR: kern/7713
addresses by default.
Add a knob "icmp_bmcastecho" to "rc.network" to allow this
behaviour to be controlled from "rc.conf".
Document the controlling sysctl variable "net.inet.icmp.bmcastecho"
in sysctl(3).
Reviewed by: dg, jkh
Reminded on -hackers by: Steinar Haug <sthaug@nethelp.no>
4.1.4. Experimental Protocol
A system should not implement an experimental protocol unless it
is participating in the experiment and has coordinated its use of
the protocol with the developer of the protocol.
Pointed out by: Steinar Haug <sthaug@nethelp.no>
another specialized mbuf type in the process. Also clean up some
of the cruft surrounding IPFW, multicast routing, RSVP, and other
ill-explored corners.
several new features are added:
- support vc/vp shaping
- support pvc shadow interface
code cleanup:
- remove WMAYBE related code. ENI WMAYBE DMA doen't work.
- remove updating if_lastchange for every packet.
- BPF related code is moved to midway.c as it should be.
(bpfwrite should work if atm_pseudohdr and LLC/SNAP are
prepended.)
- BPF link type is changed to DLT_ATM_RFC1483.
BPF now understands only LLC/SNAP!! (because bpf can't
handle variable link header length.)
It is recommended to use LLC/SNAP instead of NULL
encapsulation for various reasons. (BPF, IPv6,
interoperability, etc.)
the code has been used for months in ALTQ and KAME IPv6.
OKed by phk long time ago.
`len = min(so->so_snd.sb_cc, win) - off;'. min() has type u_int
and `off' has type int, so when min() is 0 and `off' is 1, the RHS
overflows to 0U - 1 = UINT_MAX. `len' has type long, so when
sizeof(long) == sizeof(int), the LHS normally overflows to to the
correct value of -1, but when sizeof(long) > sizeof(int), the LHS
is UINT_MAX.
Fixed some u_long's that should have been fixed-sized types.
mismatches exposed by this (the prototype for tcp_respond() didn't
match the function definition lexically, and still depends on a
gcc feature to match if ints have more than 32 bits).
Any packet that can be matched by a ipfw rule can be redirected
transparently to another port or machine. Redirection to another port
mostly makes sense with tcp, where a session can be set up
between a proxy and an unsuspecting client. Redirection to another machine
requires that the other machine also be expecting to receive the forwarded
packets, as their headers will not have been modified.
/sbin/ipfw must be recompiled!!!
Reviewed by: Peter Wemm <peter@freebsd.org>
Submitted by: Chrisy Luke <chrisy@flix.net>
Remove lots'o'hacks.
looutput is now static.
Other callers who want to use loopback to allow shortcutting
should call the special entrypoint for this, if_simloop(), which is
specifically designed for this purpose. Using looutput for this purpose
was problematic, particularly with bpf and trying to keep track
of whether one should be using the charateristics of the loopback interface
or the interface (e.g. if_ethersubr.c) that was requesting the loopback.
There was a whole class of errors due to this mis-use each of which had
hacks to cover them up.
Consists largly of hack removal :-)
or unsigned int (this doesn't change the struct layout, size or
alignment in any of the files changed in this commit, at least for
gcc on i386's. Using bitfields of type u_char may affect size and
alignment but not packing)).
ifdef tangle. The previous commit to ip_fil.h didn't change the
one that actually applies to the current FreeBSD kernel, of course.
Fixed.
Fixed style bugs in previous commit to ip_fil.h.
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.
The prototype FreeBSD/alpha machdep will follow in a couple of days
time.
so that the new behaviour is now default.
Solves the "infinite loop in diversion" problem when more than one diversion
is active.
Man page changes follow.
The new code is in -stable as the NON default option.
net.inet.ip.icmp.bmcastecho = 0 by removing the extra check for the
address being a multicast address. The test now relies on the link
layer flags that indicate it was received via multicast. The previous
logic was broken and replied to ICMP echo/timestamp broadcasts even
when the sysctl option disallowed them.
Reviewed by: wollman
Prior to this change, Accidental recursion protection was done by
the diverted daemon feeding back the divert port number it got
the packet on, as the port number on a sendto(). IPFW knew not to
redivert a packet to this port (again). Processing of the ruleset
started at the beginning again, skipping that divert port.
The new semantic (which is how we should have done it the first time)
is that the port number in the sendto() is the rule number AFTER which
processing should restart, and on a recvfrom(), the port number is the
rule number which caused the diversion. This is much more flexible,
and also more intuitive. If the user uses the same sockaddr received
when resending, processing resumes at the rule number following that
that caused the diversion. The user can however select to resume rule
processing at any rule. (0 is restart at the beginning)
To enable the new code use
option IPFW_DIVERT_RESTART
This should become the default as soon as people have looked at it a bit
passed to the user process for incoming packets. When the sockaddr_in
is passed back to the divert socket later, use thi sas the primary
interface lookup and only revert to the IP address when the name fails.
This solves a long standing bug with divert sockets:
When two interfaces had the same address (P2P for example) the interface
"assigned" to the reinjected packet was sometimes incorect.
Probably we should define a "sockaddr_div" to officially hold this
extended information in teh same manner as sockaddr_dl.
of the TCP payload. See RFC1122 section 4.2.2.6 . This allows
Path MTU discovery to be used along with IP options.
PR: problem discovered by Kevin Lahey <kml@nas.nasa.gov>
so it must be adjusted (minus 1) before using it to do the length check.
I'm not sure who to give the credit to, but the bug was reported by
Jennifer Dawn Myers <jdm@enteract.com>, who also supplied a patch. It
was also fixed in OpenBSD previously by andreas.gunnarsson@emw.ericsson.se,
and of course I did the homework to verify that the fix was correct per
the specification.
PR: 6738
NetBSD, ported to FreeBSD by Pierre Beyssac <pb@fasterix.freenix.org> and
minorly tweaked by me.
This is a standard part of FreeBSD, but must be enabled with:
"sysctl -w net.inet.ip.fastforwarding=1" ...and of course forwarding must
also be enabled. This should probably be modified to use the zone
allocator for speed and space efficiency. The current algorithm also
appears to lose if the number of active paths exceeds IPFLOW_MAX (256),
in which case it wastes lots of time trying to figure out which cache
entry to drop.