2008-05-13 08:33:41 -04:00
|
|
|
TODO items. These are interesting todo items.
|
2007-05-10 03:36:23 -04:00
|
|
|
o understand synthesized DNAMEs, so those TTL=0 packets are cached properly.
|
2008-02-29 04:28:55 -05:00
|
|
|
o NSEC/NSEC3 aggressive negative caching, so that updates to NSEC/NSEC3
|
|
|
|
|
will result in proper negative responses.
|
|
|
|
|
o (option) where port 53 is used for send and receive, no other ports are used.
|
2007-07-30 05:27:49 -04:00
|
|
|
o (option) to not send replies to clients after a timeout of (say 5 secs) has
|
|
|
|
|
passed, but keep task active for later retries by client.
|
2008-02-29 04:28:55 -05:00
|
|
|
o (option) private TTL feature (always report TTL x in answers).
|
|
|
|
|
o (option) pretend-dnssec-unaware, and pretend-edns-unaware modes for workshops.
|
2007-08-02 05:21:58 -04:00
|
|
|
o delegpt use rbtree for ns-list, to avoid slowdown for very large NS sets.
|
2008-02-29 04:28:55 -05:00
|
|
|
o (option) reprime and refresh oft used data before timeout.
|
|
|
|
|
o (option) retain prime results in a overlaid roothints file.
|
|
|
|
|
o (option) store primed key data in a overlaid keyhints file (sort of like drafttimers).
|
2007-08-31 08:35:41 -04:00
|
|
|
o windows version, auto update feature, a query to check for the version.
|
2007-09-12 10:16:46 -04:00
|
|
|
o command the server with TSIG inband. get-config, clearcache,
|
|
|
|
|
get stats, get memstats, get ..., reload, clear one zone from cache
|
2007-08-31 08:35:41 -04:00
|
|
|
o NSID rfc 5001 support.
|
2007-09-11 14:18:16 -04:00
|
|
|
o timers rfc 5011 support.
|
2007-09-05 07:47:09 -04:00
|
|
|
o Treat YXDOMAIN from a DNAME properly, in iterator (not throwaway), validator.
|
2007-09-20 10:39:22 -04:00
|
|
|
o make timeout backoffs randomized (a couple percent random) to spread traffic.
|
2007-09-21 09:34:44 -04:00
|
|
|
o inspect date on executable, then warn user in log if its more than 1 year.
|
2008-02-29 04:28:55 -05:00
|
|
|
o (option) proactively prime root, stubs and trust anchors, feature.
|
2007-10-30 09:40:43 -04:00
|
|
|
early failure, faster on first query, but more traffic.
|
2007-12-03 03:24:36 -05:00
|
|
|
o library add convenience functions for A, AAAA, PTR, getaddrinfo, libresolve.
|
|
|
|
|
o library add function to validate input from app that is signed.
|
2008-01-10 11:24:07 -05:00
|
|
|
o add dynamic-update requests (making a dynupd request) to libunbound api.
|
2008-01-17 04:19:34 -05:00
|
|
|
o SIG(0) and TSIG.
|
2008-01-23 02:54:43 -05:00
|
|
|
o support OPT record placement on recv anywhere in the additional section.
|
2008-02-05 08:53:05 -05:00
|
|
|
o add local-file: config with authority features.
|
2008-02-29 04:28:55 -05:00
|
|
|
o (option) to make local-data answers be secure for libunbound (default=no)
|
2008-04-14 04:51:09 -04:00
|
|
|
o (option) to make chroot: copy all needed files into jail (or make jail)
|
|
|
|
|
perhaps also print reminder to link /dev/random and sysloghack.
|
|
|
|
|
o overhaul outside-network servicedquery to merge with udpwait and tcpwait,
|
|
|
|
|
to make timers in servicedquery independent of udpwait queues.
|
2008-04-15 07:15:52 -04:00
|
|
|
o check into rebinding ports for efficiency, configure time test.
|
2008-05-13 08:33:41 -04:00
|
|
|
o EVP hardware crypto support.
|
2008-11-13 06:50:56 -05:00
|
|
|
o option to ignore all inception and expiration dates for rrsigs.
|
|
|
|
|
o cleaner code; return and func statements on newline.
|
|
|
|
|
o memcached module that sits before validator module; checks for memcached
|
2009-04-06 08:22:11 -04:00
|
|
|
data (on local lan), stores recursion lookup. Provides one cache for multiple resolver machines, coherent reply content in anycast setup.
|
2008-11-13 08:45:27 -05:00
|
|
|
o no openssl_add_all_algorithms, but only the ones necessary, less space.
|
2009-06-12 06:04:29 -04:00
|
|
|
o listen to NOTIFY messages for zones and flush the cache for that zone
|
|
|
|
|
if received. Useful when also having a stub to that auth server.
|
|
|
|
|
Needs proper protection, TSIG, in place.
|
2009-06-17 11:18:38 -04:00
|
|
|
o winevent - do not go more than 64 fds (by polling with select one by
|
|
|
|
|
one), win95/98 have 100fd limit in the kernel, so this ruins w9x portability.
|
2008-05-13 08:33:41 -04:00
|
|
|
|
2008-11-13 06:50:56 -05:00
|
|
|
*** Features features, for later
|
|
|
|
|
* dTLS, TLS, look to need special port numbers, cert storage, recent libssl.
|
|
|
|
|
* aggressive negative caching for NSEC, NSEC3.
|
|
|
|
|
* multiple queries per question, server exploration, server selection.
|
|
|
|
|
* support TSIG on queries, for validating resolver deployment.
|
|
|
|
|
* retry-mode, where a bogus result triggers a retry-mode query, where a list
|
|
|
|
|
of responses over a time interval is collected, and each is validated.
|
|
|
|
|
or try in TCP mode. Do not 'try all servers several times', since we must
|
|
|
|
|
not create packet storms with operator errors.
|
2008-08-04 04:30:49 -04:00
|
|
|
o on windows version, implement that OS ancillary data capabilities for
|
|
|
|
|
interface-automatic. IPPKTINFO, IP6PKTINFO for WSARecvMsg, WSASendMsg.
|
2008-11-13 06:50:56 -05:00
|
|
|
o local-zone directive with authority service, full authority server
|
|
|
|
|
is a non-goal.
|
2009-02-03 04:55:35 -05:00
|
|
|
o infra and lame cache: easier size config (in Mb), show usage in graphs.
|
2009-07-08 07:37:50 -04:00
|
|
|
- store time of dump in cachedumps, so that on a load the ttls can be
|
|
|
|
|
compared to the absolute time, and now-expired items can be dealt with.
|
2009-04-06 08:22:11 -04:00
|
|
|
|
2009-07-01 07:48:34 -04:00
|
|
|
1.3.x:
|
2009-06-08 09:47:06 -04:00
|
|
|
- spoofed delegpt fixes - if DNSKEY prime fails
|
|
|
|
|
- set DNSKEY bogus and DNSKEY query msg bogus.
|
|
|
|
|
- make NS set bogus too - if not validated as secure.
|
|
|
|
|
- check where queries go - otherwise reduce TTL on NS.
|
|
|
|
|
- also make DS NSEC bogus. Also DS msg cache entry.
|
2009-06-11 06:52:28 -04:00
|
|
|
- mark bogus under stringent conditions
|
|
|
|
|
- if DS at parent and validly signed. Then DNSKEY must exist.
|
|
|
|
|
- Also for trust anchor points themselves. DNSKEY must exist.
|
|
|
|
|
- so if then DNSKEY keyprime fails
|
|
|
|
|
- then it is not simply a server that only answers qtype A.
|
|
|
|
|
- then parent is agreeing (somewhat) with the DS record
|
|
|
|
|
- but it could still be a lame domain, these exist
|
|
|
|
|
The objective is to keep tries for genuinely lame domains to a
|
|
|
|
|
minimum, while detecting forgeries quickly. exponential backoff.
|
|
|
|
|
- for unbound we can check if we got something to verify while
|
|
|
|
|
building that chain of trust. If so - not lame, agressive retry.
|
|
|
|
|
- but security-lame zones also exist and should not pose
|
|
|
|
|
too high a burden. Exponential backoff again.
|
|
|
|
|
(fe. badly signed or dnskey reply too large fails).
|
|
|
|
|
- the delegation NS for the domain is bogus.
|
|
|
|
|
The referral retried, with exponential backoff.
|
|
|
|
|
This exponential backoff should go towards values which are close
|
|
|
|
|
to the TTLs that are used now (on lame delegations for example).
|
|
|
|
|
so that the extra traffic is manageable.
|
|
|
|
|
- for unbound, reset the TTL on the NS rrset. Let it timeout.
|
|
|
|
|
Set NS rrset bogus - no more queries to the domain are done.
|
|
|
|
|
Also set DNSKEY and DS (rrset, NSEC, msg) bogus and ttl like that.
|
|
|
|
|
(to the same absolute value, so a clean retry is done).
|
|
|
|
|
TTL of NS is (rounddown) timeout in seconds.
|
|
|
|
|
Until the NS times out and referral is done again.
|
|
|
|
|
Make sure multiple validations for chains of trust do not result
|
|
|
|
|
in a flood of queries or backoff too quickly.
|
2009-06-08 09:47:06 -04:00
|
|
|
- bogus exponential backoff cache. hash(name,t,c), size(1M, 5%).
|
|
|
|
|
TTL of 24h. Backoff from 200msec to 24h.
|
|
|
|
|
x2 on bogus(18 tries), x8 backoff on lameness(6 tries),
|
|
|
|
|
when servfail for DNSKEY.
|
|
|
|
|
remove entry when validated as secure.
|
2009-06-11 05:43:23 -04:00
|
|
|
delegptspoofrecheck on lameness when harden-referral-path NS
|
|
|
|
|
query has servfail, then build chain of trust down (check DS,
|
|
|
|
|
then perform DNSKEY query) if that DNSKEY query fails servfail,
|
|
|
|
|
perform the x8 lameness retry fallback.
|
2009-06-08 09:47:06 -04:00
|
|
|
|
2009-07-09 10:48:31 -04:00
|
|
|
* keep a list of guilty IP addresses in the qstate, which contains both
|
|
|
|
|
the child side guilty IPs and the parent guilty IPs. Valid signed DSes
|
|
|
|
|
are not made guilty in the global cache. The child IP is made guilty
|
|
|
|
|
in the global cache.
|
|
|
|
|
* Retry to higher trust anchors.
|
|
|
|
|
* option not to retry to higher from this ta.
|
|
|
|
|
* keep longest must-be-secure name. Do no accept insecure above this point.
|
|
|
|
|
* if failed ta, blame all lower tas for their DNSKEY (get IP from cached
|
|
|
|
|
rrset), if failure is insecure - nothing, if at bogus - blame that too.
|
2009-07-10 08:27:16 -04:00
|
|
|
lower tas have isdata=false, so the IP address for the dnskeyrrset in
|
|
|
|
|
the cache is set to avoid in qstate. Nothing in infracache, no childretry.
|
2009-07-09 10:48:31 -04:00
|
|
|
|
2009-07-06 10:51:58 -04:00
|
|
|
Retry harder to get valid DNSSEC data.
|
|
|
|
|
Triggered by a trust anchor or by a signed DS record for a zone.
|
|
|
|
|
* If data is fetched and validation fails for it
|
|
|
|
|
or DNSKEY is fetched and validated into chain-of-trust fails for it
|
|
|
|
|
or DS is fetched and validated into chain-of-trust fails for it
|
|
|
|
|
Then
|
2009-07-10 08:27:16 -04:00
|
|
|
blame(signer zone, IP origin of the data/DNSKEY/DS, x2, isdata)
|
2009-07-06 10:51:58 -04:00
|
|
|
* If data was not fetched (SERVFAIL, lame, ...), and the data
|
|
|
|
|
is under a signed DS then:
|
|
|
|
|
blame(thatDSname, IP origin of the data/DNSKEY/DS, x8)
|
|
|
|
|
x8 because the zone may be lame.
|
|
|
|
|
This means a chain of trust is built also for unfetched data, to
|
|
|
|
|
determine if a signed DS is present. If insecure, nothing is done.
|
|
|
|
|
* If DNSKEY was not fetched for chain of trust (SERVFAIL, lame, ...),
|
|
|
|
|
Then
|
|
|
|
|
blame(DNSKEYname, IP origin of the data/DNSKEY/DS, x8)
|
|
|
|
|
x8 because the zone may be lame.
|
2009-07-10 08:27:16 -04:00
|
|
|
* blame(zonename, guiltyIP, multiplier, isdata):
|
|
|
|
|
* if isdata:
|
|
|
|
|
Set the guiltyIP,zonename as DNSSEC-bogus-data=true in lameness cache.
|
2009-07-06 10:51:58 -04:00
|
|
|
Thusly marked servers are avoided if possible, used as last resort.
|
2009-07-10 08:27:16 -04:00
|
|
|
The guilt TTL is the infra cache ttl (15 minutes).
|
|
|
|
|
The dnssec retry scheme works without this cache entry.
|
|
|
|
|
* If the key cache entry 'being-backed-off' is true and isdata then:
|
|
|
|
|
The parent is backedoff, it must be the childs fault. Retry to child.
|
2009-07-13 10:14:11 -04:00
|
|
|
if the child-dnskey is bogus, then retry is useless, stop.
|
2009-07-10 08:27:16 -04:00
|
|
|
Perform a child-retry - purge dataonly, childside, mark
|
2009-07-10 06:01:58 -04:00
|
|
|
data-IPaddress from child as to avoid-forquery. counterperquery,
|
|
|
|
|
max is 3, if reached, set this data element RRset&msg to the
|
|
|
|
|
current backoff TTL end-time or bogus-ttl(60 seconds) whichever is less
|
2009-07-06 10:51:58 -04:00
|
|
|
and done.
|
|
|
|
|
* if no retry entry exists for the zone key, create one with 24h TTL, 10 ms.
|
|
|
|
|
else the backoff *= multiplier.
|
|
|
|
|
* If the backoff is less than a second, remove entries from cache and
|
|
|
|
|
restart query. Else set the TTL for the entries to that value.
|
|
|
|
|
* Entries to set or remove: DNSKEY RRset&msg, DS RRset&msg, NS RRset&msg,
|
|
|
|
|
in-zone glue (A and AAAA) RRset&msg, and key-cache-entry TTL.
|
2009-07-10 08:27:16 -04:00
|
|
|
The the data element RRset&msg to the backoff TTL or bogusttl.
|
2009-07-06 10:51:58 -04:00
|
|
|
If TTL>1sec set key-cache-entry flag 'being-backed-off' to true.
|
2009-07-07 11:00:18 -04:00
|
|
|
when entry times out that flag is reset to false again.
|
2009-07-06 10:51:58 -04:00
|
|
|
* Storage extra is:
|
|
|
|
|
IP address per RRset and message. A lot of memory really, since that is
|
|
|
|
|
132 bytes per RRset and per message. Store plain IP: 4/16 bytes, len byte.
|
2009-07-10 08:27:16 -04:00
|
|
|
port number 2bytes. +19bytes per RRset, per msg.
|
|
|
|
|
guilt flag in infra(lameness) cache.
|
2009-07-06 10:51:58 -04:00
|
|
|
being-backed-off flag for key cache, also backoff time value and its TTL.
|
2009-07-10 06:01:58 -04:00
|
|
|
child-retry-count and guilty-ip-list in qstate.
|
2009-07-06 10:51:58 -04:00
|
|
|
* Load on authorities:
|
|
|
|
|
For lame servers: 7 tries per day (one per three hours on average).
|
|
|
|
|
Others get up to 23 tries per day (one per hour on average).
|
2009-07-10 08:27:16 -04:00
|
|
|
+1 for original try makes 8/24 hours and 24/24 hours.
|
2009-07-06 10:51:58 -04:00
|
|
|
Unless the cache entry falls out of the cache due to memory. In that
|
|
|
|
|
case it can be tried more often, this is similar to the NS entry falling
|
|
|
|
|
out of the cache due to memory, in that case it also has to be retried.
|
|
|
|
|
* Performance analysis:
|
|
|
|
|
* domain is sold. Unbound sees invalid signature (expired) or the old
|
|
|
|
|
servers refuse the queries. Retry within the second, if parent has
|
|
|
|
|
new DS and NS available instantly works again (no downtime).
|
|
|
|
|
* domain is bogus signed. Parent gets 1 query per hour.
|
2009-07-10 08:27:16 -04:00
|
|
|
Domain itself gets couple tries per queryname, per minute.
|
2009-07-06 10:51:58 -04:00
|
|
|
* domain partly bogus. Parent gets 1 query per hour.
|
2009-07-10 08:27:16 -04:00
|
|
|
Domain itself gets couple tries per bogus queryname, per minute.
|
2009-07-06 10:51:58 -04:00
|
|
|
* spoof attempt. Unbound tries a couple times. If not spoofed again,
|
|
|
|
|
it works, if spoofed every time unbound backs off and stops trying.
|
2009-07-10 08:27:16 -04:00
|
|
|
But childretry is attempted more often, once per minute.
|
2009-07-06 10:51:58 -04:00
|
|
|
* parent has inconsistently signed DS records. Together with a subzone that
|
|
|
|
|
is badly managed. Unbound backs up to the root once per hour.
|
2009-07-07 06:37:56 -04:00
|
|
|
* parent has bad DS records, different sets on different servers, but they
|
2009-07-10 08:27:16 -04:00
|
|
|
are signed ok. Works as for every query a list of bad nameserver, parent
|
|
|
|
|
and child side is kept, walks through them. But as backoff increases
|
|
|
|
|
and becomes bigger than the TTL on the DS records, unbound will blackout.
|
|
|
|
|
The parent really has to be fixed...
|
|
|
|
|
The issue is that it is validly signed, but bad data. Unbound will very
|
|
|
|
|
conservatively retry it.
|
2009-07-09 10:48:31 -04:00
|
|
|
* domain is sold, but decommission is faster than the setup of new server.
|
2009-07-06 10:51:58 -04:00
|
|
|
Unbound does exponential backoff, if new setup is fast, it'll pickup the
|
|
|
|
|
new data fast.
|
|
|
|
|
* key rollover failed. The zone has bad keys. Like it was bogus signed.
|
|
|
|
|
* one nameserver has bad data. Unbound goes back to the parent but also
|
|
|
|
|
marks that server as guilty. Picks data from other server right after,
|
2009-07-10 08:27:16 -04:00
|
|
|
retry without blackout for the user.
|
|
|
|
|
When parent starts to get backed off, if the nameserver is childside,
|
|
|
|
|
queryretries for childservers are made when queries fail.
|
2009-07-06 10:51:58 -04:00
|
|
|
* domain was sold, but unbound has old entries in the cache. These somehow
|
|
|
|
|
need (re)validation (were queried with +cd, now -cd). The entries are
|
2009-07-10 08:27:16 -04:00
|
|
|
bogus.
|
|
|
|
|
Unbound performs childretry for these entries. Works once the keys
|
|
|
|
|
have been successfully reprimed with parentretry.
|
2009-07-06 10:51:58 -04:00
|
|
|
* unbound is configured to talk to upstream caches. These caches have
|
|
|
|
|
inconsistent bad data. If one is bad, it is marked bad for that zone.
|
|
|
|
|
If all are bad, there may not be any way for unbound to remove the
|
|
|
|
|
bad entries from the upstream caches. It simply fails.
|
|
|
|
|
Recommendation: make the upstream caches validate as well.
|
2009-07-07 11:00:18 -04:00
|
|
|
* Old data that was valid with a long TTL remains in the cache.
|
2009-07-10 08:27:16 -04:00
|
|
|
Valid data has a TTL and this is the protocol.
|
2009-07-08 08:07:03 -04:00
|
|
|
* listing bad servers and trying again may not be good enough, since
|
|
|
|
|
a combinatorial explosion for DSxDNSKEYxdata is possible for every
|
|
|
|
|
signature validation (using different nameservers for DS, DNSKEY and
|
|
|
|
|
data, assuming only the right combination has a chain of trust to data).
|
2009-07-10 08:27:16 -04:00
|
|
|
The parentretries perform DS and DNSKEY searching.
|
|
|
|
|
childretries perform data searching.
|
2009-07-07 11:00:18 -04:00
|
|
|
|
2009-07-06 10:51:58 -04:00
|
|
|
|
2009-06-08 09:47:06 -04:00
|
|
|
later
|
|
|
|
|
- selective verbosity; ubcontrol trace example.com
|
|
|
|
|
- option to log only bogus domainname encountered, for demos
|
|
|
|
|
- cache fork-dump, pre-load
|
|
|
|
|
- for fwds, send queries to N servers in fwd-list, use first reply.
|
|
|
|
|
document high scalable, high available unbound setup onepager.
|
|
|
|
|
- prefetch DNSKEY when DS in delegation seen (nonCD, underTA).
|
|
|
|
|
- use libevent if available on system by default(?), default outgoing 256to1024
|
|
|
|
|
|