Instead of an RBT for the forwarders table, use a QP trie.
We now use reference counting for dns_forwarders_t. When a forwarders
object is retrieved by dns_fwdtable_find(), it must now be explicitly
detached by the caller afterward.
QP tries require stored objects to include their names, so the
the forwarders object now has that. This obviates the need to
pass back a separate 'foundname' value from dns_fwdtable_find().
replace the red-black tree used by the negative trust anchor table
with a QP trie.
because of this change, dns_ntatable_init() can no longer fail, and
neither can dns_view_initntatable(). these functions have both been
changed to type void.
this function finds the closest matching ancestor, but the function
name could be read to imply that it returns the direct parent node;
this commit suggests a slightly less misleading name.
Make the `pval_r` and `ival_r` out arguments optional.
Add `pval_r` and `ival_r` out arguments to `dns_qp_deletekey()`
and `dns_qp_deletename()`, to return the deleted leaf.
Allow larger TTL values in zones that go insecure. This is necessary
because otherwise the zone will not be loaded due to the max-zone-ttl
of P1D that is part of the current insecure policy.
In the keymgr.c code, default back to P1D if the max-zone-ttl is set
to zero.
Add an option to enable/disable inline-signing inside the
dnssec-policy clause. The existing inline-signing option that is
set in the zone clause takes priority, but if it is omitted, then the
value that is set in dnssec-policy is taken.
The built-in policies use inline-signing.
This means that if you want to use the default policy without
inline-signing you either have to set it explicitly in the zone
clause:
zone "example" {
...
dnssec-policy default;
inline-signing no;
};
Or create a new policy, only overriding the inline-signing option:
dnssec-policy "default-dynamic" {
inline-signing no;
};
zone "example" {
...
dnssec-policy default-dynamic;
};
This also means that if you are going insecure with a dynamic zone,
the built-in "insecure" policy needs to be accompanied with
"inline-signing no;".
The updatenotify mechanism in dns_db relied on unlocked ISC_LIST for
adding and removing the "listeners". The mechanism relied on the
exclusive mode - it should have been updated only during reconfiguration
of the server. This turned not to be true anymore in the dns_catz - the
updatenotify list could have been updated during offloaded work as the
offloaded threads are not subject to the exclusive mode.
Change the update_listeners to be cds_lfht (lock-free hash-table), and
slightly refactor how register and unregister the callbacks - the calls
are now idempotent (the register call already was and the return value
of the unregister function was mostly ignored by the callers).
The dns_badcache unit had (yet another) own locked hashtable
implementation. Replace the hashtable used by dns_badcache with
lock-free cds_lfht implementation from liburcu.
When dns_request was canceled via dns_requestmgr_shutdown() the cancel
event would be propagated on different loop (loop 0) than the loop where
request was created on. In turn this would propagate down to isc_netmgr
where we require all the events to be called from the matching isc_loop.
Pin the dns_requests to the loops and ensure that all the events are
called on the associated loop. This in turn allows us to remove the
hashed locks on the requests and change the single .requests list to be
a per-loop list for the request accounting.
Additionally, do some extra cleanup because some race condititions are
now not possible as all events on the dns_request are serialized.
The isc_stats_create() can no longer return anything else than
ISC_R_SUCCESS. Refactor isc_stats_create() and its variants in libdns,
libns and named to just return void.
1. Change the _new, _add and _copy functions to return the new object
instead of returning 'void' (or always ISC_R_SUCCESS)
2. Cleanup the isc_ht_find() + isc_ht_add() usage - the code is always
locked with catzs->lock (mutex), so when isc_ht_find() returns
ISC_R_NOTFOUND, the isc_ht_add() must always succeed.
3. Instead of returning direct iterator for the catalog zone entries,
add dns_catz_zone_for_each_entry2() function that calls callback
for each catalog zone entry and passes two extra arguments to the
callback. This will allow changing the internal storage for the
catalog zone entries.
4. Cleanup the naming - dns_catz_<fn>_<obj> -> dns_catz_<obj>_<fn>, as an
example dns_catz_new_zone() gets renamed to dns_catz_zone_new().
store a pointer to the running loop when creating a dispatch entry
with dns_dispatch_add(), and use isc_loop_now() to get the timestamp for
the current event loop tick when we initialize the dispentry start time
and check for timeouts.
ultimately we want the slab implementation of dns_rdataset to
be usable by more database implementaions than just rbtdb. this
commit moves rdataset_methods to rdataslab.c, renamed
dns_rdataslab_rdatasetmethods.
new database methods have been added: locknode, unlocknode,
addglue, expiredata, and deletedata, allowing external functions to
perform functions that previously required internal access to the
database implementation.
database and heap pointers are now stored in the dns_slabheader object
so that header is the only thing that needs to be passed to some
functions; this will simplify moving functions that process slabheaders
out of rbtdb.c so they can be used by other database implementations.
to reduce the amount of common code that will need to be shared
between the separated cache and zone database implementations,
clean up unused portions of dns_db.
the methods dns_db_dump(), dns_db_isdnssec(), dns_db_printnode(),
dns_db_resigned(), dns_db_expirenode() and dns_db_overmem() were
either never called or were only implemented as nonoperational stub
functions: they have now been removed.
dns_db_nodefullname() was only used in one place, which turned out
to be unnecessary, so it has also been removed.
dns_db_ispersistent() and dns_db_transfernode() are used, but only
the default implementation in db.c was ever actually called. since
they were never overridden by database methods, there's no need to
retain methods for them.
in rbtdb.c, beginload() and endload() methods are no longer defined for
the cache database, because that was never used (except in a few unit
tests which can easily be modified to use the zone implementation
instead). issecure() is also no longer defined for the cache database,
as the cache is always insecure and the default implementation of
dns_db_issecure() returns false.
for similar reasons, hashsize() is no longer defined for zone databases.
implementation functions that are shared between zone and cache are now
prepended with 'dns__rbtdb_' so they can become nonstatic.
serve_stale_ttl is now a common member of dns_db.
in preparation for splitting up rbtdb.c, rename some types so they
can be defined in dns/types.h instead of only locally. these include:
- struct noqname, which is used to hold no-qname and closest-encloser
proofs, and is now named dns_proof_t;
- rbtdb_rdatatype_t, which is used to hold a pair of rdatatypes and
is now called dns_typepair_t and defined in rdatatype.h;
- rbtdb_serial_t, which is now just a uint32_t;
- rdatasetheader_t and rdatasetheaderlist_t, now called
dns_slabheader_t and dns_slabheaderlist_t;
- rbtdb_version_t, now called dns_rbtdb_version_t.
the helper functions header_from_raw() and raw_from_header() are
renamed dns_slabheader_fromrdataset() and dns_slabheader_raw().
also made further style changes:
- fixing uninitialized pointer variables throughout rbtdb.c;
- switching some initializations to struct literals;
- renaming some functions and struct members more descriptively;
- replacing dns_db_secure_t with a simple bool since it no longer needs
to be tri-valued.
the DNS_DBFIND_VALIDATEGLUE and DNS_DBFIND_FORCENSEC options
were never set, so the code implementing them (which in the case
of _VALIDATEGLUE appears to have been quite outdated anyway) was
never reached. they have now been removed.
BIND's rdataset structure is a view of some DNS records. It is
polymorphic, so the details of how the records are stored can vary.
For instance, the records can be held in an rdatalist, or in an
rdataslab in the rbtdb.
The dns_rdataset structure previously had a number of fields called
`private1` up to `private7`, which were used by the various rdataset
implementations. It was not at all clear what these fields were for,
without reading the code and working it out from context.
This change makes the rdataset inheritance hierarchy more clear. The
polymorphic part of a `struct dns_rdataset` is now a union of structs,
each of which is named for the class of implementation using it. The
fields of these structs replace the old `privateN` fields. (Note: the
term "inheritance hierarchy" refers to the fact that the builtin and
SDLZ implementations are based on and inherit from the rdatalist
implementation, which in turn inherits from the generic rdataset.
Most of this change is mechanical, but there are a few extras.
In keynode.c there were a number of REQUIRE()ments that were not
necessary: they had already been checked by the rdataset method
dispatch code. On the other hand, In ncache.c there was a public
function which needed to REQUIRE() that an rdataset was valid.
I have removed lots of "reset iterator state" comments, because it
should now be clear from `target->iter = NULL` where before
`target->private5 = NULL` could have been doing anything.
Initialization is a bit neater in a few places, using C structure
literals where appropriate.
The pointer arithmetic for translating between an rdataslab header and
its raw contents is now fractionally safer.
- use externally accessible functions for attachnode/detachnode
so these functions can be moved outside rbtdb.c
- simplify and tidy up some other functions
- use struct initializers when appropriate
- remove the flag RDATASET_ATTR_RETAIN; it was never being set
- renamed the rdataset attributes to
- remove the 'slab_methods' rdataset implementation. this was
a reduced set of slab rdataset methods, omitting 'setownercase()'
and 'getownercase()'. we can get the identical result by using
an DNS_RDATASETATTR_KEEPCASE attribute in rdatasets that
shouldn't have their case modified, and then we only need one
set of rdataset methods.
in the past there was overlap between the fields used
as resolver fetch options and ADB addrinfo flags. this has
mostly been eliminated; now we can clean up the rest of
it and remove some confusing comments.
The ability to read legacy HMAC-MD5 K* keyfile pairs using algorithm
number 157 was accidentally lost when the algorithm numbers were
consolidated into a single block, in commit
09f7e0607a.
The assumption was that these algorithm numbers were only known
internally, but they were also used in key files. But since HMAC-MD5
got renumbered from 157 to 160, legacy HMAC-MD5 key files no longer
work.
Move HMAC-MD5 back to 157 and GSSAPI back to 160. Add exception for
GSSAPI to list_hmac_algorithms.
The dns_zone_catz_enable_db() and dns_zone_catz_disable_db()
functions can race with similar operations in the catz module
because there is no synchronization between the threads.
Add catz functions which use the view's catalog zones' lock
when registering/unregistering the database update notify callback,
and use those functions in the dns_zone module, instead of doing it
directly.
view->adb may be referenced while the view is shutting down as the
zone uses a weak reference to the view and examines view->adb but
dns_view_detach call dns_adb_detach to clear view->adb.
- style fixes and general tidying-up in tkey.c
- remove the unused 'intoken' parameter from dns_tkey_buildgssquery()
- remove an unnecessary call to dns_tkeyctx_create() in ns_server_create()
(the TKEY context that was created there would soon be destroyed and
another one created when the configuration was loaded).
since it is not necessary to find partial matches when looking
up names in a TSIG keyring, we can use a hash table instead of
an RBT to store them.
the tsigkey object now stores the key name as a dns_fixedname
rather than allocating memory for it.
the `name` parameter to dns_tsigkeyring_add() has been removed;
it was unneeded since the tsigkey object already contains a copy
of the name.
the opportunistic cleanup_ring() function has been removed;
it was only slowing down lookups.
the prior practice of passing a dns_name containing the
expanded name of an algorithm to dns_tsigkey_create() and
dns_tsigkey_createfromkey() is unnecessarily cumbersome;
we can now pass the algorithm number instead.
- remove the 'ring' parameter from dns_tsigkey_createfromkey(),
and use dns_tsigkeyring_add() to add key objects to a keyring instead.
- add a magic number to dns_tsigkeyring_t
- change dns_tsigkeyring_dumpanddetach() to dns_tsigkeyring_dump();
we now call dns_tsigkeyring_detach() separately.
- remove 'maxgenerated' from dns_tsigkeyring_t since it never changes.
use the ISC_REFCOUNT attach/detach implementation in dns/tsig.c
so that detailed tracing can be used during refactoring.
dns_tsig_keyring_t has been renamed dns_tsigkeyring_t so the type
and the attach/detach function names will match.
- style cleanups.
- simplify the function parameters to dns_tsigkey_create():
+ remove 'restored' and 'generated', they're only ever set to false.
+ remove 'creator' because it's only ever set to NULL.
+ remove 'inception' and 'expiry' because they're only ever set to
(0, 0) or (now, now), and either way, this means "never expire".
+ remove 'ring' because we can just use dns_tsigkeyring_add() instead.
- rename dns_keyring_restore() to dns_tsigkeyring_restore() to match the
rest of the functions operating on dns_tsigkeyring objects.
When create_fetch() in the dns_validator unit detects deadlock, it
returns DNS_R_NOVALIDSIG, but it didn't attach to the validator. The
other condition to returning result != ISC_R_SUCCESS would be error from
dns_resolver_createfetch(). The caller (in two places out of three)
would detect the error condition and always detach from the validator.
Move the dns_validator_detach() on dns_resolver_createfetch() error
condition to create_fetch() function and cleanup the extra detaches in
seek_dnskey() and get_dsset().
Remove the code implementing nonstardard behaviors that were formerly
needed to allow GSS-TSIG to work with Windows 2000, which passed
End-of-Life in 2010.
Deprecate the "oldgsstsig" command and "-o" command line option
to nsupdate; these are now treated as synonyms for "gsstsig" and "-g"
respectively.
it was possible to add a TSIG key to more than one TSIG
keyring at a time, and this was in fact happening with the
session key, which was generated once and then added to the
keyrings for each view as it was configured.
this has been corrected and a REQUIRE added to dns_tsigkeyring_add()
to prevent it from happening again.
The following code block repeats quite often:
if (rdata.type == dns_rdatatype_dnskey ||
rdata.type == dns_rdatatype_cdnskey ||
rdata.type == dns_rdatatype_cds)
Introduce a new function to reduce the repetition.
The raw zone is not supposed to be signed. DNSKEY records in a raw zone
should not trigger zone signing. The update code needs to be able to
identify when it is working on a raw zone. Add dns_zone_israw() and
dns_zone_issecure() enable it to do this. Also, we need to check the
case for 'auto-dnssec maintain'.
A `dns_qmpulti_t` no longer needs to know about its loopmgr. We no
longer keep a linked list of `dns_qpmulti_t` that have reclamation
work, and we no longer mark chunks with the phase in which they are to
be reclaimed. Instead, empty chunks are listed in an array in a
`qp_rcu_t`, which is passed to call_rcu().
If the resolver received a FORMERR response to a request with
an DNS COOKIE option present that echoes the option back, resend
the request without an DNS COOKIE option present.
After the dns_xfrin was changed to use network manager, the maximum
global (max-transfer-time-in) and idle (max-transfer-idle-in) times for
incoming transfers were turned inoperational because of missing
implementation.
Restore this functionality by implementing the timers for the incoming
transfers.
It should be floor(DNS_NAME_MAXWIRE / 2) + 1 == 128
The mistake was introduced in c6bf51492d because:
* I was refactoring an existing `DNS_MAX_LABELS` defined as 127
* There was a longstanding bug in `dns_name_isvalid()` which
checked the number of labels against 127U instead of 128
* I mistakenly thought `dns_name_isvalid()` was correct and
`dns_name_countlabels()` was incorrect, but the reverse was true.
After this commit, occurrances of `DNS_NAME_MAXLABELS` with value
128 are consistent with the use of 127 or 128 before commit
c6bf51492d except for the mistake in `dns_name_isvalid()`.
This commit adds a test case that checks the MAXLABELS case
in `dns_name_fromtext()` and `dns_name_isvalid()`.
This change makes the zone table lock-free for reads. Previously, the
zone table used a red-black tree, which is not thread safe, so the hot
read path acquired both the per-view mutex and the per-zonetable
rwlock. (The double locking was to fix to cleanup races on shutdown.)
One visible difference is that zones are not necessarily shut down
promptly: it depends on when the qp-trie garbage collector cleans up
the zone table. The `catz` system test checks several times that zones
have been deleted; the test now checks for zones to be removed from
the server configuration, instead of being fully shut down. The catz
test does not churn through enough zones to trigger a gc, so the zones
are not fully detached until the server exits.
After this change, it is still possible to improve the way we handle
changes to the zone table, for instance, batching changes, or better
compaction heuristics.
Revert refcount debug tracing (commit a8b29f0365), there are better
ways to do it.
Use the dns_qpmethods_t typedef where appropriate.
Some stylistic improvements.
It is sometimes necessary to access a qp-trie outside an isc_loop,
such as in tests or an isc_work callback. The best option was to use
a `dns_qpmulti_write()` transaction, but that has overheads that are
not necessary for read-only access, such as committing a new version
of the trie even when nothing changed.
So this commit adds a `dns_qpmulti_read()` transaction, which is
nearly as lightweight as a query transaction, but it takes the mutex
like a write transaction.
This is the first of the "fancy" searches that know how the DNS
namespace maps on to the structure of a qp-trie. For example, it will
find the closest enclosing zone in the zone tree.
The dns_adbentry_overquota() was violating the layers accessing the
adbentry struct members directly. Change it to dns_adb_overquota() to
match the dns_adb API.
use the ISC_REFCOUNT implementation for dns_zone_attach() and
_detach(). (this applies only to external zone references, not
to dns_zone_iattach() and dns_zone_idetach().)
use dns_zone_ref() where previously a dummy zone object had been
used to increment the reference count.
My original idea had been that the core qp-trie code would be mostly
independent of the storage for keys, so I did not make it check at run
time that key lengths are sensible. However, the qp-trie search
routines need to get keys out of leaf objects, for which they provide
storage on the stack, which is particularly dangerous for unchecked
buffer overflows. So this change checks that key lengths are in bounds
at the API boundary between the qp-trie code and the rest of BIND, and
there is no more pretence that keys might be longer.
Add a new configuration option to set how the checkds method should
work. Acceptable values are 'yes', 'no', and 'explicit'.
When set to 'yes', the checkds method is to lookup the parental agents
by querying the NS records of the parent zone.
When set to 'no', no checkds method is enabled. Users should run
the 'rndc checkds' command to signal that DS records are published and
withdrawn.
When set to 'explicit', the parental agents are explicitly configured
with the 'parental-agents' configuration option.
This should have no functional effects.
The message size stats are specified by RSSAC002 so it's best not
to mess around with how they appear in the statschannel. But it's
worth changing the implementation to use general-purpose histograms,
to reduce code size and benefit from sharded counters.
Cleanup the remnants of MS Compiler bits from <isc/refcount.h>, printing
the information in named/main.c, and cleanup some comments about Windows
that no longer apply.
The bits in picohttpparser.{h,c} were left out, because it's not our
code.
The isc_fsaccess API was created to hide the implementation details
between POSIX and Windows APIs. As we are not supporting the Windows
APIs anymore, it's better to drop this API used in the DST part.
Moreover, the isc_fsaccess was setting the permissions in an insecure
manner - it operated on the filename, and not on the file descriptor
which can lead to all kind of attacks if unpriviledged user has read (or
even worse write) access to key directory.
Replace the code that operates on the private keys with code that uses
mkstemp(), fchmod() and atomic rename() at the end, so at no time the
private key files have insecure permissions.
The only place where dns_name_hash() was being used is the old hash
table in the dns_badcache unit. Squash the dns_name_fullhash() and
dns_name_hash() into single dns_name_hash() function that's always
case-insensitive as it doesn't make to do case-sensitive hashing of the
domain names and we were not using this anywhere.
Instead of marking the unused entities with UNUSED(x) macro in the
function body, use a `ISC_ATTR_UNUSED` attribute macro that expans to
C23 [[maybe_unused]] or __attribute__((__unused__)) as fallback.
Change the isc_job_run() to not-make any allocations. The caller must
make sure that it allocates isc_job_t - usually as part of the argument
passed to the callback.
For simple jobs, using isc_async_run() is advised as it allocates its
own separate isc_job_t.
for testing purposes, we need to be able to specify a library path from
which to load the dnsrps implementation. this can now be done with the
"dnsrps-library" option.
DNSRPS can now be enabled in configure regardless of whether librpz.so
is currently installed on the system.
the new dns_view_addtrustedkey() function allows a view's trust
anchors to be updated directly. this code was formerly in
dns_client_addtrustedkey(), which is now a wrapper around
dns_view_addtrustedkey().
stop and restart the server in the 'tsiggss' test, in order
to confirm that GSS negotiated TSIG keys are saved and restored
when named loads.
added logging to dns_tsigkey_createfromkey() to indicate whether
a key has been statically configured, generated via GSS negotiation,
or restored from a file.
REQUIRE that rdata->type is dns_rdatatype_svcb to detect when
dns_rdata_checksvcb is called with the wrong rdata type. There are
no code paths that currently pass the wrong rdata to dns_rdata_checksvcb.
This was found by GCC 12 static analysis.
without diffie-hellman TKEY negotiation, some other code is
now effectively dead or unnecessary, and can be cleaned up:
- the rndc tsig-list and tsig-delete commands.
- a nonoperational command-line option to dnssec-keygen that
was documented as being specific to DH.
- the section of the ARM that discussed TKEY/DH.
- the functions dns_tkey_builddeletequery(), processdeleteresponse(),
and tkey_processgssresponse(), which are unused.
Completely remove the TKEY Mode 2 (Diffie-Hellman Exchanged Keying) from
BIND 9 (from named, named.conf and all the tools). The TKEY usage is
fringe at best and in all known cases, GSSAPI is being used as it should.
The draft-eastlake-dnsop-rfc2930bis-tkey specifies that:
4.2 Diffie-Hellman Exchanged Keying (Deprecated)
The use of this mode (#2) is NOT RECOMMENDED for the following two
reasons but the specification is still included in Appendix A in case
an implementation is needed for compatibility with old TKEY
implementations. See Section 4.6 on ECDH Exchanged Keying.
The mixing function used does not meet current cryptographic
standards because it uses MD5 [RFC6151].
RSA keys must be excessively long to achieve levels of security
required by current standards.
We might optionally implement Elliptic Curve Diffie-Hellman (ECDH) key
exchange mode 6 if the draft ever reaches the RFC status. Meanwhile the
insecure DH mode needs to be removed.
There are leftovers from the previous refactoring effort, which left
some function declarations and comments in the header file unchanged.
Finish the renaming.
This implements node reference tracing that passes all the internal
layers from dns_db API (and friends) to increment_reference() and
decrement_reference().
It can be enabled by #defining DNS_DB_NODETRACE in <dns/trace.h> header.
The output then looks like this:
incr:node:check_address_records:rootns.c:409:0x7f67f5a55a40->references = 1
decr:node:check_address_records:rootns.c:449:0x7f67f5a55a40->references = 0
incr:nodelock:check_address_records:rootns.c:409:0x7f67f5a55a40:0x7f68304d7040->references = 1
decr:nodelock:check_address_records:rootns.c:449:0x7f67f5a55a40:0x7f68304d7040->references = 0
There's associated python script to find the missing detach located at:
https://gitlab.isc.org/isc-projects/bind9/-/snippets/1038
Now that we can configure a different digest type, update the code
to honor the configuration. Update 'dns_dnssec_syncupdate' so that
the correct CDS record is published, and also when deleting CDS records,
ensure that all possible CDS records are removed from the zone.
In general, it's better to do one thorough compaction when a batch of
work is complete, which is the way that `update` transactions work.
Conversely, `write` transactions are designed so that lots of little
transactions are not too inefficient, but they need explicit
compaction. This changes `dns_qp_compact()` so that it is easier to
compact any time that makes sense, if there isn't a better way to
schedule compaction. And `dns_qpmulti_commit()` only recycles garbage
when there is enough to make it worthwhile.
Add some qp-trie tracing macros which can be enabled by a
developer. These print a message when a leaf is attached or
detached, indicating which part of the qp-trie implementation
did so. The refcount methods must now return the refcount value
so it can be printed by the trace macros.
The first working multi-threaded qp-trie was stuck with an unpleasant
trade-off:
* Use `isc_rwlock`, which has acceptable write performance, but
terrible read scalability because the qp-trie made all accesses
through a single lock.
* Use `liburcu`, which has great read scalability, but terrible
write performance, because I was relying on `rcu_synchronize()`
which is rather slow. And `liburcu` is LGPL.
To get the best of both worlds, we need our own scalable read side,
which we now have with `isc_qsbr`. And we need to modify the write
side so that it is not blocked by readers.
Better write performance requires an async cleanup function like
`call_rcu()`, instead of the blocking `rcu_synchronize()`. (There
is no blocking cleanup in `isc_qsbr`, because I have concluded
that it would be an attractive nuisance.)
Until now, all my multithreading qp-trie designs have been based
around two versions, read-only and mutable. This is too few to
work with asynchronous cleanup. The bare minimum (as in epoch
based reclamation) is three, but it makes more sense to support an
arbitrary number. Doing multi-version support "properly" makes
fewer assumptions about how safe memory reclamation works, and it
makes snapshots and rollbacks simpler.
To avoid making the memory management even more complicated, I
have introduced a new kind of "packed reader node" to anchor the
root of a version of the trie. This is simpler because it re-uses
the existing chunk lifetime logic - see the discussion under
"packed reader nodes" in `qp_p.h`.
I have also made the chunk lifetime logic simpler. The idea of a
"generation" is gone; instead, chunks are either mutable or
immutable. And the QSBR phase number is used to indicate when a
chunk can be reclaimed.
Instead of the `shared_base` flag (which was basically a one-bit
reference count, with a two version limit) the base array now has a
refcount, which replaces the confusing ad-hoc lifetime logic with
something more familiar and systematic.
Adjust the dns_qp_memusage() and dns_qp_compact() functions
to be more informative and flexible about handling fragmentation.
Avoid wasting space in runt chunks.
Switch from twigs_mutable() to cells_immutable() because that is the
sense we usually want.
Drop the redundant evacuate() function and rename evacuate_twigs() to
evacuate(). Move some chunk test functions closer to their point of
use.
Clarify compact_recursive(). Some small cleanups to comments.
Use isc_time_monotonic() for qp-trie timing stats.
Use #define constants to control debug logging.
Set up DNS name label offsets in dns_qpkey_fromname() so it is easier
to use in cases where the name is not fully hydrated.
A qp-trie is a kind of radix tree that is particularly well-suited to
DNS servers. I invented the qp-trie in 2015, based on Dan Bernstein's
crit-bit trees and Phil Bagwell's HAMT. https://dotat.at/prog/qp/
This code incorporates some new ideas that I prototyped using
NLnet Labs NSD in 2020 (optimizations for DNS names as keys)
and 2021 (custom allocator and garbage collector).
https://dotat.at/cgi/git/nsd.git
The BIND version of my qp-trie code has a number of improvements
compared to the prototype developed for NSD.
* The main omission in the prototype was the very sketchy outline of
how locking might work. Now the locking has been implemented,
using a reader/writer lock and a mutex. However, it is designed to
benefit from liburcu if that is available.
* The prototype was designed for two-version concurrency, one
version for readers and one for the writer. The new code supports
multiversion concurrency, to provide a basis for BIND's dbversion
machinery, so that updates are not blocked by long-running zone
transfers.
* There are now two kinds of transaction that modify the trie: an
`update` aims to support many very small zones without wasting
memory; a `write` avoids unnecessary allocation to help the
performance of many small changes to the cache.
* There is also a single-threaded interface for situations where
concurrent access is not necessary.
* The API makes better use of types to make it more clear which
operations are permitted when.
* The lookup table used to convert a DNS name to a qp-trie key is
now initialized by a run-time constructor instead of a programmer
using copy-and-paste. Key conversion is more flexible, so the
qp-trie can be used with keys other than DNS names.
* There has been much refactoring and re-arranging things to improve
the terminology and order of presentation in the code, and the
internal documentation has been moved from a comment into a file
of its own.
Some of the required functionality has been stripped out, to be
brought back later after the basics are known to work.
* Garbage collector performance statistics are missing.
* Fancy searches are missing, such as longest match and
nearest match.
* Iteration is missing.
* Search for update is missing, for cases where the caller needs to
know if the value object is mutable or not.
Some qp-trie operations will need to know the maximum number of labels
in a name, so I wanted a standard macro definition with the right
value.
Replace DNS_MAX_LABELS from <dns/resolver.h with DNS_NAME_MAXLABELS in
<dns/name.h>, and add its counterpart DNS_NAME_LABELLEN.
Use these macros in `name.c` and `resolver.c`.
Fix an off-by-one error in an assertion in `dns_name_countlabels()`.
Instead of holding the catzs->lock the whole time we process the catz
update, only hold it for hash table lookup and then release it. This
should unblock any other threads that might be processing updates to
catzs triggered by extra incoming transfer.
Offload catalog zone processing so that the network manager threads
are not interrupted by a large catalog zone update.
Introduce a new 'updaterunning' state alongside with 'updatepending',
like it is done in the RPZ module.
Note that the dns__catz_update_cb() function currently holds the
catzs->lock during the whole process, which is far from being optimal,
but the issue is going to be addressed separately.
This change should make sure that catalog zone update processing
doesn't happen when the catalog zone is being shut down. This
should help avoid races when offloading the catalog zone updates
in the follow-up commit.
* Change 'dns_catz_new_zones()' function's prototype (the order of the
arguments) to synchronize it with the similar function in rpz.c.
* Rename 'refs' to 'references' in preparation of ISC_REFCOUNT_*
macros usage for reference tracking.
* Unify dns_catz_zone_t naming to catz, and dns_catz_zones_t naming to
catzs, following the logic of similar changes in rpz.c.
* Use C compound literals for structure initialization.
* Synchronize the "new zone version came too soon" log message with the
one in rpz.c.
* Use more of 'sizeof(*ptr)' style instead of the 'sizeof(type_t)' style
expressions when allocating or freeing memory for 'ptr'.
`libirs` used to be a reference implementation of `getaddrinfo` and
related modern resolver APIs. It was stripped down in BIND 9.18
leaving only the `irs_resconf` module, which parses
`/etc/resolv.conf`. I have kept its include path and namespace prefix,
so it remains a little fragment of libirs now embedded in libdns.
the dns_xfrin module was still using the network manager directly to
manage TCP connections and send and receive messages. this commit
changes it to use the dispatch manager instead.
use ISC_REFCOUNT_IMPL for dns_xfrin_ctx_t (which has been renamed
to dns_xfrin_t to keep the function names dns_xfrin_attach() and
dns_xfrin_detach() unchanged).
the 'dispatchmgr' member of the resolver object is used by both
the dns_resolver and dns_request modules, and may in the future
be used by others such as dns_xfrin. it doesn't make sense for it
to live in the resolver object; this commit moves it into dns_view.
removed references in code comments, doc/dev documentation, etc, to
isc_task, isc_timer_reset(), and isc_timertype_inactive. also removed a
coccinelle patch related to isc_timer_reset() that was no longer needed.
move all dns_sdb code into bin/named/builtin.c, which is the
only place from which it's called.
(note this is temporary: later we'll refactor builtin so that it's a
standalone dns_db implementation on its own instead of using SDB
as a wrapper.)
move database attach/detach functions to db.c, instead of
requiring them to be implemented for every database type.
instead, they must implement a 'destroy' function that is
called when references go to zero.
this enables us to use ISC_REFCOUNT_IMPL for databases,
with detailed tracing enabled by setting DNS_DB_TRACE to 1.
SDB is currently (and foreseeably) only used by the named
builtin databases, so it only needs as much of its API as
those databases use.
- removed three flags defined for the SDB API that were always
set the same by builtin databases.
- there were two different types of lookup functions defined for
SDB, using slightly different function signatures. since backward
compatibility is no longer a concern, we can eliminate the 'lookup'
entry point and rename 'lookup2' to 'lookup'.
- removed the 'allnodes' entry point and all database iterator
implementation code
- removed dns_sdb_putnamedrr() and dns_sdb_putnamedrdata() since
they were never used.
some dns_db functions would have crashed if the DB implementation failed
to implement them, requiring the implementations to add functions that
did nothing but return ISC_R_NOTIMPLEMENTED or some obvious default
value. we can just have the dns_db wrapper functions themselves return
those values, and clean up the implementations accordingly.
make the private isc__rdatalist_* functions public dns_rdatalist
functions so that all the rdatalist primitives can be used by
callers to libdns. (this will be needed later for moving SDB and
SDLZ out of libdns.)
as every validator function is loop-synchronized, it should no longer be
necessary to use a validator lock.
calling dns_validator_send(), dns_validator_cancel() or
dns_validator_destroy() from a thread other than the one on which the
validator is running will now cause an assertion failure; this should be
fine since the validator and resolver are tightly coupled, and the fetch
contexts and validators run in the same loops.
refactor validator so that the validation status object (previously
called dns_valstatus_t, which was derived from dns_validatorevent_t), is
now part of the dns_validator object. when calling validator callbacks,
the validator itself is now sent as the argument.
(note: this necessitates caution in the callback functions that are
internal to validator.c validators spawn other validators, and it can be
confusing at times whether we need to be looking at val, val->subvalidator,
or val->parent.)
as there is no further use of isc_task in BIND, this commit removes
it, along with isc_taskmgr, isc_event, and all other related types.
functions that accepted taskmgr as a parameter have been cleaned up.
as a result of this change, some functions can no longer fail, so
they've been changed to type void, and their callers have been
updated accordingly.
the tasks table has been removed from the statistics channel and
the stats version has been updated. dns_dyndbctx has been changed
to reference the loopmgr instead of taskmgr, and DNS_DYNDB_VERSION
has been udpated as well.
change functions using isc_taskmgr_beginexclusive() to use
isc_loopmgr_pause() instead.
also, removed an unnecessary use of exclusive mode in
named_server_tcptimeouts().
most functions that were implemented as task events because they needed
to be running in a task to use exclusive mode have now been changed
into loop callbacks instead. (the exception is catz, which is being
changed in a separate commit because it's a particularly complex change.)
callback events from dns_resolver_createfetch() are now posted
using isc_async_run.
other modules which called the resolver and maintained task/taskmgr
objects for this purpose have been cleaned up.
The callbacks from dns_abd_createfind() are now posted using
isc_async_run() instead of isc_task_send(). ADB event types
have been replaced with a new dns_adbstatus_t type which is
included as find->status.
(The ADB still uses a task for dns_resolver_createfetch().)
dns_request_create() and _createraw() now take a 'loop' parameter
and run the callback event on the specified loop.
as the task manager is no longer used, it has been removed from
the dns_requestmgr structure. the dns_resolver_taskmgr() function
is also no longer used and has been removed.
- removed documentation of -S option from named man page
- removed documentation of reserved-sockets from ARM
- simplified documentation of dnssec-secure-to-insecure - it
now just says it's obsolete rather than describing what it
doesn't do anymore
- marked three formerly obsolete options as ancient:
parent-registration-delay, reserved-sockets, and
suppress-initial-notify
isc_bind9 was a global bool used to indicate whether the library
was being used internally by BIND or by an external caller. external
use is no longer supported, but the variable was retained for use
by dyndb, which needed it only when being built without libtool.
building without libtool is *also* no longer supported, so the variable
can go away.
Instead of using an extra rarely-used paramater to dns_clientinfo_init()
to set ECS information for a client, this commit adds a function
dns_clientinfo_setecs() which can be called only when ECS is needed.
* rbt node chains were sized to allow for bitstring labels, so they
had 256 levels; but in the absence of bistrings, 128 is enough.
* dns_byaddr_createptrname() had a redundant options argument,
and a very outdated doc comment.
* A number of comments referred to bitstring labels in a way that is
no longer helpful. (A few informative comments remain.)
Previously, the zone loading and dumping was effectively serialized by
the dns_io_t mechanism. In theory, more IO operations could be run in
parallel, but the zone manager .iolimit was set to 1 and never increased
as dns_zonemgr_setiolimit() was never ever called.
As the dns_master asynchronous load and dump was already offloaded to
non-worker threads with isc_work mechanism, drop the whole dns_io_t
and just rely on the isc_work to do the load and dump scheduling.
Instead of dumping the signed zone contents node by node during the
signing, dump the entire zone at the end. This was already done for the
raw zone format, but it shows that the IO is better utilized when the
zone dump is done in one single write rather than in small chunks.
A side effect of dumping node by node was that all names were printed
relative to the zone origin rather than being grouped under different
$ORIGINs as would normally be the case when dumping a zone. Also, state
was not maintained from one node to the next regarding whether the CLASS
has already been printed, so it was always included with the first
record of each node.
Since dnssec-signzone uses the dns_master_style_explicittl text format
style, and is the only application that does so, we can revise that
style and add a new DNS_STYLEFLAG_CLASS_PERNAME flag to get the output
back to what it was before this change.
Add support for loading and validating the 'tls' parameter from
the forwarders' configuration.
This prepares ground for adding support to forward queries to
DoT-enabled upstream servers.
DSCP has not been fully working since the network manager was
introduced in 9.16, and has been completely broken since 9.18.
This seems to have caused very few difficulties for anyone,
so we have now marked it as obsolete and removed the
implementation.
To ensure that old config files don't fail, the code to parse
dscp key-value pairs is still present, but a warning is logged
that the feature is obsolete and should not be used. Nothing is
done with configured values, and there is no longer any
range checking.
With 'stale-answer-enable yes;' and 'stale-answer-client-timeout off;',
consider the following situation:
A CNAME record and its target record are in the cache, then the CNAME
record expires, but the target record is still valid.
When a new query for the CNAME record arrives, and the query fails,
the stale record is used, and then the query "restarts" to follow
the CNAME target. The problem is that the query's multiple stale
options (like DNS_DBFIND_STALEOK) are not reset, so 'query_lookup()'
treats the restarted query as a lookup following a failed lookup,
and returns a SERVFAIL answer when there is no stale data found in the
cache, even if there is valid non-stale data there available.
With this change, query_lookup() now considers non-stale data in the
cache in the first place, and returns it if it is available.
Remove parsing the configuration options 'alt-transfer-source',
'alt-transfer-source-v6', and 'use-alt-transfer-source', and remove
the corresponding code that implements the feature.
The dns_remote_t structure is intended to replace the variables in
the structure that deals with remote server communication to primaries,
parental agents, forwarders, etc.
The dispatches are not thread-bound, and used freely between various
threads (see the dns_resolver and dns_request units for details).
This refactoring make sure that all non-const dns_dispatch_t and
dns_dispentry_t members are accessed under a lock, and both object now
track their internal state (NONE, CONNECTING, CONNECTED, CANCELED)
instead of guessing the state from the state of various struct members.
During the refactoring, the artificial limit DNS_DISPATCH_SOCKSQUOTA on
UDP sockets per dispatch was removed as the limiting needs to happen and
happens on in dns_resolver and limiting the number of UDP sockets
artificially in dispatch could lead to unpredictable behaviour in case
one dispatch has the limit exhausted by others are idle.
The TCP artificial limit of DNS_DISPATCH_MAXREQUESTS makes even less
sense as the TCP connections are only reused in the dns_request API
that's not a heavy user of the outgoing connections.
As a side note, the fact that UDP and TCP dispatch pretends to be same
thing, but in fact the connected UDP is handled from dns_dispentry_t and
dns_dispatch_t acts as a broker, but connected TCP is handled from
dns_dispatch_t and dns_dispatchmgr_t acts as a broker doesn't really
help the clarity of this unit.
This refactoring kept to API almost same - only dns_dispatch_cancel()
and dns_dispatch_done() were merged into dns_dispatch_done() as we need
to cancel active netmgr handles in any case to not leave dangling
connections around. The functions handling UDP and TCP have been mostly
split to their matching counterparts and the dns_dispatch_<function>
functions are now thing wrappers that call <udp|tcp>_dispatch_<function>
based on the socket type.
More debugging-level logging was added to the unit to accomodate for
this fact.
The dns_adb_getcookie() doesn't use the 'adb' parameter, remove it.
Refactor the dns_adb_getcookie() function to just return the size of
the cookie when the caller passes 'NULL' as the 'cookie' argument.
'DNS_DB_STALEOK' returns stale rdatasets as well as current rdatasets.
'DNS_DB_EXPIREDOK' returns expired rdatasets as well as current
rdatasets. This option is currently only set when DNS_DB_STALEOK is
also set.
dns_db_updatenotify_unregister needed to be called earlier to ensure
that listener->onupdate_arg always points to a valid object. The
existing lazy cleanup in rbtdb_free did not ensure that.
This is second in the series of fixing the usage of hashtables in the
dns_adb and the dns_resolver units.
Currently, the fetch buckets (used to hold the fetch context) and zone
buckets (used to hold per-domain counters) would never get cleaned from
the memory. Combined with the fact that the hashtable now grows as
needed (instead of using hashtable as buckets), the memory usage in the
resolver can just grow and it never drops down.
In this commit, the usage of hashtables (hashmaps) has been completely
rewritten, so there are no "buckets" and all the matching conditions are
directly mapped into the hashtable key:
1. For per-domain counter hashtable, this is simple as the lowercase
domain name is used directly as a counter.
2. For fetch context hashtable, this requires copying some extra flags
back and forth in the key.
As we don't hold the "buckets" forever, the cleaning mechanism has been
rewritten as well:
1. For per-domain counter hashtable, this is again much simpler, as we
only need to check whether the usage counter is still zero under the
lock and bail-out on cleaning if the counter is in use.
2. For fetch context hashtable, this is more complicated as the fetch
context cannot be reused after it has been finished. The algorithm
is different, the fetch context is always removed from the
hashtable, but if we find the fetch context that has been marked
as finished in the lookup function, we help with the cleaning from
the hashtable and try again.
Couple of additional changes have been implemented in this refactoring
as those were needed for correct functionality and could not be split
into individual commits (or would not make sense as seperate commits):
1. The dns_resolver_createfetch() has an option to create "unshared"
fetch. The "unshared" fetch will never get matched, so there's
little point in storing the "unshared" fetch in the hashtable.
Therefore the "unshared" fetches are now detached from the
hashtable and live just on their own.
2. Replace the custom reference counting with ISC_REFCOUNT_DECL/IMPL
macros for better tracing.
3. fctx_done_detach() is idempotent, it makes the "final" detach (the
one matching the create function) only once. But that also means
that it has to be called before the detach that kept the fetch
context alive in the callback. A new macro fctx_done_unref() has
been added to allow this code flow:
fctx_done_unref(fctx, result);
fctx_detach(&fctx);
Doing this the other way around could cause fctx to get destroyed in
the fctx_unref() first and fctx_done_detach() would cause UAF.
4. The resume_qmin() and resume_dslookup() callbacks have been
refactored for more readability and simpler code paths. The
validated() callback has also received some of the simplifications,
but it should be refactored in the future as it is bit of spaghetti
now.
The mechanism for associating a worker task to a database now
uses loops rather than tasks.
For this reason, the parameters to dns_cache_create() have been
updated to take a loop manager rather than a task manager.
After change 5995, zone transfers were using a small
compression context that only had space for the first
few dozen names in each message. They now use a large
compression context with enough space for every name.
The dns_adb unit has been refactored to be much simpler. Following
changes have been made:
1. Simplify the ADB to always allow GLUE and hints
There were only two places where dns_adb_createfind() was used - in
the dns_resolver unit where hints and GLUE addresses were ok, and in
the dns_zone where dns_adb_createfind() would be called without
DNS_ADBFIND_HINTOK and DNS_ADBFIND_GLUEOK set.
Simplify the logic by allowing hint and GLUE addresses when looking
up the nameserver addresses to notify. The difference is negligible
and would cause a difference in the notified addresses only when
there's mismatch between the parent and child addresses and we
haven't cached the child addresses yet.
2. Drop the namebuckets and entrybuckets
Formerly, the namebuckets and entrybuckets were used to reduced the
lock contention when accessing the double-linked lists stored in each
bucket. In the previous refactoring, the custom hashtable for the
buckets has been replaced with isc_ht/isc_hashmap, so only a single
item (mostly, see below) would end up in each bucket.
Removing the entrybuckets has been straightforward, the only matching
was done on the isc_sockaddr_t member of the dns_adbentry.
Removing the zonebuckets required GLUEOK and HINTOK bits to be
removed because the find could match entries with-or-without the bits
set, and creating a custom key that stores the
DNS_ADBFIND_STARTATZONE in the first byte of the key, so we can do a
straightforward lookup into the hashtable without traversing a list
that contains items with different flags.
3. Remove unassociated entries from ADB database
Previously, the adbentries could live in the ADB database even after
unlinking them from dns_adbnames. Such entries would show up as
"Unassociated entries" in the ADB dump. The benefit of keeping such
entries is little - the chance that we link such entry to a adbname
is small, and it's simpler to evict unlinked entries from the ADB
cache (and the hashtable) than create second LRU cleaning mechanism.
Unlinked ADB entries are now directly deleted from the hash
table (hashmap) upon destruction.
4. Cleanup expired entries from the hash table
When buckets were still in place, the code would keep the buckets
always allocated and never shrink the hash table (hashmap). With
proper reference counting in place, we can delete the adbnames from
the hash table and the LRU list.
5. Stop purging the names early when we hit the time limit
Because the LRU list is now time ordered, we can stop purging the
names when we find a first entry that doesn't fullfil our time-based
eviction criteria because no further entry on the LRU list will meet
the criteria.
Future work:
1. Lock contention
In this commit, the focus was on correctness of the data structure,
but in the future, the lock contention in the ADB database needs to
be addressed. Currently, we use simple mutex to lock the hash
tables, because we almost always need to use a write lock for
properly purging the hashtables. The ADB database needs to be
sharded (similar to the effect that buckets had in the past). Each
shard would contain own hashmap and own LRU list.
2. Time-based purging
The ADB names and entries stay intact when there are no lookups.
When we add separate shards, a timer needs to be added for time-based
cleaning in case there's no traffic hashing to the inactive shard.
3. Revisit the 30 minutes limit
The ADB cache is capped at 30 minutes. This needs to be revisited,
and at least the limit should be configurable (in both directions).
The dns_adb would serialize all fetches on a single task. Create a
per-thread task, so the fetches will stay local to the thread that
initiated the fetch.
The dns_rpz_zones structure was using .refs and .irefs for strong and
weak reference counting. Rewrite the unit to use just a single
reference counting + shutdown sequence (dns_rpz_destroy_rpzs) that must
be called by the creator of the dns_rpz_zones_t object. Remove the
reference counting from the dns_rpz_zone structure as it is not needed
because the zone objects are fully embedded into the dns_rpz_zones
structure and dns_rpz_zones_t object must never be destroyed before all
dns_rpz_zone_t objects.
The dns_rps_zones_t reference counting uses the new ISC_REFCOUNT_TRACE
capability - enable by defining DNS_RPZ_TRACE in the dns/rpz.h header.
Additionally, add magic numbers to the dns_rpz_zone and dns_rpz_zones
structures.
The dns_cache API contained a cache cleaning mechanism that would be
disabled for 'rbt' based cache. As named doesn't have any other cache
implementations, remove the cache cleaning mechanism from dns_cache API.
The aim is to do less work per byte:
* Check the bounds for each label, instead of checking the
bounds for each character.
* Instead of copying one character at a time from the wire to
the name, copy entire runs of sequential labels using memmove()
to make the most of its fast loop.
* To remember where the name ends, we only need to set the end
marker when we see a compression pointer or when we reach the
root label. There is no need to check if we jumped back and
conditionally update the counter for every character.
* To parse a compression pointer, we no longer take a diversion
around the outer loop in between reading the upper byte of the
pointer and the lower byte.
* The parser state machine is now implicit in the instruction
pointer, instead of being an explicit variable. Similarly,
when we reach the root label we break directly out of the loop
instead of setting a second state machine variable.
* DNS_NAME_DOWNCASE is never used with dns_name_fromwire() so
that option is no longer supported.
I have removed this comment which dated from January 1999 when
dns_name_fromwire() was first introduced:
/*
* Note: The following code is not optimized for speed, but
* rather for correctness. Speed will be addressed in the future.
*/
No functional change, apart from removing support for the unused
DNS_NAME_DOWNCASE option. The new code is about 2x faster than the
old code: best case 11x faster, worst case 1.4x faster.
There were a number of places where the zone table should have been
locked, but wasn't, when dns_zt_apply was called.
Added a isc_rwlocktype_t type parameter to dns_zt_apply and adjusted
all calls to using it. Removed locks in callers.
Instead of doing incremental zone loading with fixed quantum - 100
loaded lines per event, move the zone loading process to the offloaded
libuv threads using isc_work_enqueue() API.
This has the advantage that the thread scheduling is given back to the
operating system that understands blocking operations, and the zone
loading operation doesn't block the networking threads directly.
Incremental file loads now use loopmgr events instead of task events.
The dns_master_loadstreaminc(), _loadbufferinc(), _loadlexer() and
_loadlexerinc() functions were not used in BIND, and have been removed.
dns_rdata_checksvcb performs data entry checks on SVCB records.
In particular that _dns SVBC record have an 'alpn' and if that 'alpn'
parameter indicates HTTP is in use that 'dophath' is present.
All we need for compression is a very small hash set of compression
offsets, because most of the information we need (the previously added
names) can be found in the message using the compression offsets.
This change combines dns_compress_find() and dns_compress_add() into
one function dns_compress_name() that both finds any existing suffix,
and adds any new prefix to the table. The old split led to performance
problems caused by duplicate names in the compression context.
Compression contexts are now either small or large, which the caller
chooses depending on the expected size of the message. There is no
dynamic resizing.
There is a behaviour change: compression now acts on all the labels in
each name, instead of just the last few.
A small benchmark suggests this is about 2x faster.
sizeof(dns_name_t) did not change but the boolean attributes are now
separated as one-bit structure members. This allows debuggers to
pretty-print dns_name_t attributes without any special hacks, plus we
got rid of manual bit manipulation code.
Originally RBT node stored three lowest bits from dns_name_t attributes.
This had a curious side-effect noticed by Tony Finch:
If you create an rbt node from a DYNAMIC name then the flag will be
propagated through dns_rbt_namefromnode() ... if you subsequently call
dns_name_free() it will try to isc_mem_put() a piece of an rbt node ...
but dns_name_free() REQUIRE()s that the name is dynamic so in the usual
case where rbt nodes are created from non-dynamic names, this kind of
code will fail an assertion.
This is a bug it dates back to june 1999 when NAMEATTR_DYNAMIC was
invented.
Apparently it does not happen often :-)
I'm planning to get rid of DNS_NAMEATTR_ definitions and bit operations,
so removal of this "three-bit-subset" assignment is a first step.
We can keep only the ABSOLUTE flag in RBT node and nothing else because
names attached to rbt nodes are always readonly: The internal node_name()
function always sets the NAMEATTR_READONLY when making a dns_name that
refers to the node's name, so the READONLY flag will be set in the name
returned by dns_rbt_namefromnode().
Co-authored-by: Tony Finch <fanf@isc.org>
Getting the recorded value of 'edns-udp-size' from the resolver requires
strong attach to the dns_view because we are accessing `view->resolver`.
This is not the case in places (f.e. dns_zone unit) where `.udpsize` is
accessed. By moving the .udpsize field from `struct dns_resolver` to
`struct dns_view`, we can access the value directly even with weakly
attached dns_view without the need to lock the view because `.udpsize`
can be accessed after the dns_view object has been shut down.
The dns_view implements weak and strong reference counting. When strong
reference counting reaches zero, the adb, ntatable and resolver objects
are shut down and detached.
In dns_zone and dns_nta the dns_view was weakly attached, but the
view->resolver reference was accessed directly leading to dereferencing
the NULL pointer.
Add dns_view_getresolver() method which attaches to view->resolver
object under the lock (if it still exists) ensuring the dns_resolver
will be kept referenced until not needed.
The HMACs and GSSAPI are just using unallocated values.
Moving them around shouldn't cause issues.
Only the dnssec system test knew the internal number in use for hmacmd5.
When fuzzing it is useful for all signing operations to happen
at a specific time for reproducability. Add two variables to
the message structure (fuzzing and fuzztime) to specify if a
fixed time should be used and the value of that time.
Instead of always creating the trust anchor timer (dns_nta_t) on the
main loop, create the timer on the current loop and associate each
dns_nta_t object to the loop it was created on. This simplifies the
timer handling as everything is run on the associated loop.
During the change, the dns_nta_t structure was renamed to dns__nta_t
and changed to be fully internal to the nta.c compilation unit, and the
dns_ntatable_t structure was made opaque. This required no change to
code using the API as dns_nta_t never had any external users and the
dns_ntatable_t was properly accessed only by using function calls.
Instead of creating the response policy zone deferred update timer when
creating the response policy zone object, create it on demand on the
current loop and destroy it as soon as the timer has finished its job.
There's a side-effect - the processing of the response policy zone
update is now done on the current loop - previously, it was always on
the main loop.
Instead of creating the catalog zone deferred update timer when creating
the catalog zone object, create it on demand on the current loop and
destroy it as soon as the timer has finished its job. There's a
side-effect - the processing of the catalog zone update is now done on
the current loop - previously, it was always on the main loop.
This change prepares ground for sending DNS requests using DoT,
which, in particular, will be used for forwarding dynamic updates
to TLS-enabled primaries.
In order to make xfrin.c:get_create_tlsctx() reusable, move the function
into transport.c, and make changes into its prototype to not use the
'dns_xfrin_ctx_t' type, thus making it more universal.
This change prepares ground for adding transport support into the
dispatch manager.
Also, move the typedefs for 'dns_transport_t' and 'dns_transport_list_t'
from transport.h into types.h.
If there was a collision of key id across algorithms it was not
possible to determine where counter applies to which algorithm for
xml statistics while for json only one of the values was emitted.
The key names are now "<algorithm-number>+<id>" (e.g. "8+54274").
dns_request_create() was a front-end to dns_request_createvia() that
was only used by test binaries. dns_request_createvia() has been
renamed to dns_request_create(), and the test programs that formerly
used dns_request_create() have been updated to use the new parameters.
It is possible to bypass Response Rate Limiting (RRL)
`responses-per-second` limitation using specially crafted wildcard
names, because the current implementation, when encountering a found
DNS name generated from a wildcard record, just strips the leftmost
label of the name before making a key for the bucket.
While that technique helps with limiting random requests like
<random>.example.com (because all those requests will be accounted
as belonging to a bucket constructed from "example.com" name), it does
not help with random names like subdomain.<random>.example.com.
The best solution would have been to strip not just the leftmost
label, but as many labels as necessary until reaching the suffix part
of the wildcard record from which the found name is generated, however,
we do not have that information readily available in the context of RRL
processing code.
Fix the issue by interpreting all valid wildcard domain names as
the zone's origin name concatenated to the "*" name, so they all will
be put into the same bucket.