In the current statistics counter implementation, the statistics are
backed by an array of counters, which are updated via atomic operations.
This leads to contention, especially on high core count
machines.
This commit introduces a new isc_statsmulti_t counter that keeps a
separate array per thread. These counters are then aggregated only when
statistics are queried, shifting work off the critical path.
These changes lead to a ~2% improvement in perflab.
This adds the switch +[no]cookie to delv to control the sending of
DNS COOKIE options when sending requests. The default is to send
DNS COOKIE options.
Drop the NZF (New Zone File) fallback for persisting runtime zone
configurations, making LMDB (NZD) the only storage backend. This
removes all #ifdef HAVE_LMDB conditionals, the meson 'lmdb' option,
and the NZF-related functions. LMDB is now a mandatory build
dependency.
The named-nzd2nzf tool is now always built.
Make the maximum number of processed delegation nameservers configurable
via the new 'max-delegation-servers' option (default: 13), replacing the
hardcoded NS_PROCESSING_LIMIT (20).
The default is reduced to 13 to precisely match the maximum number of
root servers that can fit into a classic 512-byte UDP payload. This
provides a natural, historically sound cap that mitigates resource
exhaustion and amplification attacks from artificially inflated or
misconfigured delegations.
The configuration option is strictly bounded between 1 and 100 to ensure
resolver stability.
The granularity of the simple histogram with fixed number of ranges
sometimes isn't good enough. As there's a need to implement a new
histogram statistics for the incoming query times (RTT), it was decided
to also update the existing RTT statistics of the outgoing queries
so that they look similar and use common code.
Remove the old histogram code from the resolver and from the statistics
channel. Reimplement the outgoing queries RTT histogram using the
isc_histomulti module, and prepare the necessary base for implementing
the incoming queries RTT histogram. The statistics channel will be
updated to expose the new histograms in an upcoming commit.
NSEC3 hashes are required to fit within a single DNS label. Since there
are 5 bits per label byte without pad characters, the maximum hash size
is floor(63*5/8) (39 bytes).
This patch enforces this maximum length for unknown algorithms, while
strictly enforcing the exact expected digest length for known algorithms
like SHA-1.
A regression was introduced when adding the EDE code for unsupported
DNSKEY and DS algorithms. When the parent has both supported and
unsupported algorithm in the DS record, the validator would treat the
supported DS algorithm as insecure when validating DNSKEY records
instead of BOGUS. This has not security impact as the rest of the child
zone correctly ends with BOGUS status, but it is incorrect and thus the
regression has been fixed.
The count of items was stored in the raw data as first two bytes.
Instead of reading this from the raw header, move the number of the
items into the structure itself.
This needs the flexible member raw[] to be aligned on the size of the
pointer to prevent unaligned access to the start of the header from
rdataset_getheader() function that casts the raw[] to dns_slabheader_t.
After the rdataslab -> rdataslab,rdatavec split, there were couple of
unused struct members. Remove all the unused members, reorder the
members to eliminate the padding holes and thus reduce the
dns_slabheader_t and dns_slabtop_t structure sizes.
After the split to dns_rdataslab and dns_rdatavec, the
dns_rdataslab_merge() function was unused and it suffered from the same
data race as fixed in the previous commit. Instead of fixing it, just
remove the function and bunch of other unused functions from the
dns_rdataslab unit.
The fetch loop detection occured in two places: when
`dns_resolver_createfetch()` is invoked (looking up through the parent
fetches chain and stops the fetch if a parent fetch is the same qname and
qtype) and right after calling `dns_adb_findname()` in the resolver
(stops the fetch if the current fetch is the same name from the ADB
lookup, and ADB lookup needs to fetch it).
Regarding fetch loop detection at the `dns_resulver_createfetch()`
entry, there are case where both qname and qtype are similar but the
zonecut is different. This will then query different name servers and
get different responses. For instance, the following delegation
parent-side (both for `foo.example.` and `dnshost.example.`):
foo.example. 3600 NS ns.dnshost.example.
dnshost.example. 3600 NS ns.dnshost.example.
ns.dnshost.example. 3600 A 1.2.3.4
Then the child-side of `dnshost.example.`:
dnshost.example. 300 NS ns.dnshost.example.
ns.dnshost.example. 300 A 1.2.3.4
Then the child-side of `foo.example.`:
foo.example 3600 NS ns.dnshost.example.
a.foo.example 300 A 5.6.7.8
Obviously, there is a misconfiguration between the parent-side and the
child-side of `dnshost.example` (the mismatch of the TTL), but, this
happens...
Because the resolver is currently child-centric, the parent-side
delegation's glue of `dnshost.example.` will be overriden by the
child-side of the delegation. Once both A records will expires, the
resolver will attempt to find out the A RRs but will start from the
`foo.example.` zonecut, as the delegation itself is still valid.
Then the resolver will attempt to resolve `ns.dnshost.example.`, still
using the `foo.example.` zonecut, which will immediately trigger another
attempt to resolve `ns.foo.example.` (because the A RR is expired). This
is, however _not_ a loop, because the second attempt will have
`dnshost.example.` zonecut. And this changes everything, because the
resolver detects the A name is in-domain, and pass a flag to ADB so
`dns_view_find()` won't use the cache. As a result, the zonecut will be
`.`, and the hints (root servers) will be queried instead.
From that point, they'll return the parent-side delegation, which
includes the glue for `ns.dnshost.example/A`, and the resolution can
continue. Previously, this wouldn't be possible because a loop would be
detected from the second attempt to looking `ns.foo.example/A` and would
result in a SERVFAIL.
Now, the loop detection is relaxed as the loop is detected if the qname,
qtype _and_ zonecut are equals.
This commit also changes the way the loop detection post
`dns_adb_createfind()` works. From the same example above, there would
be two ADB fetches with the same name, but with two different ADB flags
(the first one without DNS_ADB_STARTATZONE, the second one with that
flag). It means that there will be two fetches out of those two ADB
lookups, both legit, and not a loop (i.e. it won't be stuck). To
differenciate between a find which has a pending fetch (which could be
from another find the current find has been attached to), a new find
option `DNS_ADBFIND_STARTEDFETCH` is introduced, which tells that the
current has did started a fetch.
That way, if a find doesn't have `DNS_ADBFIND_STARTEDFETCH` option but
has pending fetches, we know this is a find attached to a similar find
so this is a loop. Otherwise, with `DNS_ADBFIND_STARTEDFETCH`, we know
that even if there is a pending fetch, this is not a loop as the fetch
has just been started
`dns_view_findzonecut()` is used only in the context where the closest
name servers for a name need to be queried. In the future, this API
will also return the glues (if known) for those name servers, as well
as (exclusively, if both NS and DELEG exist) the DELEG record.
To avoid ambiguities with other code flows using `dns_db_findzonecut()`,
`dns_view_findzonecut()` has been renamed into `dns_view_bestzonecut()`.
Since the `sigrdataset` "output" parameter of `dns_view_findzonecut()`
is never used (always called with NULL), it is now removed.
Also, since the resolver is moving towards a parent-centric direction,
there is no point having a signature for the NS record (which is not
authoritative in the parent, so never signed) in the contextes where
`dns_view_findzonecut()` is called.
As `dns_view_findzonecut()` only returns either ISC_R_SUCCESS or
DNS_R_NXDOMAIN, and since it automatically disassociates the rdatasets
in case of failure, some call sites are simplified.
Add a type to all dns_zone_(get|set) functions that apply to sending
notifies, so the options can be set and retrieved separately per type.
This affects dns_zone_setnotifydefer, dns_zone_getnotifydefer,
dns_zone_setnotifydelay, dns_zone_getnotifydelay,
dns_zone_setnotifysrc4, and dns_zone_setnotifysrc6.
The functions dns_zone_getnotifysrc4 and dns_zone_getnotifysrc6 are
unused and can be removed.
Use a static dns_name_t for the "_dsync" label. Remove some
unnecessary dns_fixedname_t variables. Remove unnecessary dsyncname
dns_name_t from dns_dsyncfetch and rename dns_fixedname_t fname to
dsyncname.
When the CDS/CDNSKEY RRset gets updated, schedule a NOTIFY(CDS) to be
sent to the parental agent. The parental agent is published in the
parent zone as a DSYNC RRset, so first we need to figure out the
parent owner name. This is done by finding the zonecut (querying for
NS RRset until we find a postive answer).
In nsfetch_dsync, we then schedule a zone fetch for the DSYNC record
at <child-labels>._dsync.<parent-labels>. Then we queue the notify
for each target in the DSYNC records that matches the NOTIFY scheme
and CDS RRtype.
With Generalized DNS Notifications, a zone may need to send different
type of NOTIFY messages for different reasons. When creating a new
notify, allow for specifying the type.
The DSYNC record has a Port rdata field, so NOTIFY(CDS) messages may be
configured at different ports. When creating a new notify, allow for
specifying the port.
With Generalized DNS Notifications, a zone may need to send different
NOTIFY messages for different reasons. Introduce a method to
initialize a notify context and maintain a notify contexts per RRtype.
Update the functions 'dns_dnssec_syncupdate()' and
'dns_dnssec_syncdelete()' to make a distinction between a changed RRset
and no changes made.
The return code will be used later to determine if we need to send a
NOTIFY(CDS) to DSYNC endpoints.
A catalog zone is updated in an offloaded thread, which is not
stopped during a reconfiguration in an exclusive mode, and so
can cause a race condition with it.
Waiting for the offloaded threads to complete their work before
entering into the exclusive mode can potentially cause unwanted
delays, because offloaded threads are generally "allowed" to take
a longer amount of time before they complete.
Add a dns_catz_zone_prereconfig()/dns_catz_zone_postreconfig() pair
of functions which currently just lock the catalog zone when
reconfiguring it. The change should eliminate the race.
As a side note, there was already a similar pair of functions,
dns_catz_prereconfig() and dns_catz_postreconfig() which are called
before and after reconfiguring a 'dns_catz_zones_t' object.
Below are the stack traces of the reconfiguration thread which has
asserted, and a catalog zone update thread which was caught in the
middle of its work despite the fact that the exclusive mode is
turned on.
Stack trace of thread 23859:
#0 0x00007f80e7b8e52f raise (libc.so.6)
#1 0x00007f80e7b61e65 abort (libc.so.6)
#2 0x0000000000422558 assertion_failed (named)
#3 0x00007f80eaa6799e isc_assertion_failed (libisc-9.18.41.so)
#4 0x00007f80ea5bc788 dns_catz_entry_getname (libdns-9.18.41.so)
#5 0x000000000042ce0e catz_reconfigure (named)
#6 0x000000000042d3c5 configure_catz_zone (named)
#7 0x000000000042d7a4 configure_catz (named)
#8 0x0000000000430645 configure_view (named)
#9 0x000000000043d998 load_configuration (named)
#10 0x000000000044184f loadconfig (named)
#11 0x0000000000442525 named_server_reconfigcommand (named)
#12 0x000000000041b277 named_control_docommand (named)
#13 0x000000000041c74a control_command (named)
#14 0x00007f80eaa912ae task_run (libisc-9.18.41.so)
#15 0x00007f80eaa914cd isc_task_run (libisc-9.18.41.so)
#16 0x00007f80eaa46435 isc__nm_async_task (libisc-9.18.41.so)
#17 0x00007f80eaa467aa process_netievent (libisc-9.18.41.so)
#18 0x00007f80eaa475a6 process_queue (libisc-9.18.41.so)
#19 0x00007f80eaa46227 process_all_queues (libisc-9.18.41.so)
#20 0x00007f80eaa462a1 async_cb (libisc-9.18.41.so)
#21 0x00007f80e8d01893 uv__async_io.part.3 (libuv.so.1)
#22 0x00007f80e8d13ac4 uv__io_poll (libuv.so.1)
#23 0x00007f80e8d023fb uv_run (libuv.so.1)
#24 0x00007f80eaa45ced nm_thread (libisc-9.18.41.so)
#25 0x00007f80eaa9bda3 isc__trampoline_run (libisc-9.18.41.so)
#26 0x00007f80e7f1e1ca start_thread (libpthread.so.0)
#27 0x00007f80e7b798d3 __clone (libc.so.6)
...
...
Stack trace of thread 23912:
#0 0x00007f80ea5bc2da dns_catz_options_setdefault (libdns-9.18.41.so)
#1 0x00007f80ea5bd411 dns__catz_zones_merge (libdns-9.18.41.so)
#2 0x00007f80ea5c3c2f dns__catz_update_cb (libdns-9.18.41.so)
#3 0x00007f80eaa4fee9 isc__nm_work_run (libisc-9.18.41.so)
#4 0x00007f80eaa9bda3 isc__trampoline_run (libisc-9.18.41.so)
#5 0x00007f80eaa4ff48 isc__nm_work_cb (libisc-9.18.41.so)
#6 0x00007f80e8cfc75e worker (libuv.so.1)
#7 0x00007f80e7f1e1ca start_thread (libpthread.so.0)
#8 0x00007f80e7b798d3 __clone (libc.so.6)
We had a common pattern in the code that looks like this:
if (dns_rdataset_isassociated(rdataset)) {
dns_rdataset_disassociate(rdataset);
}
add a helper macro that checks for rdataset != NULL and the above
called dns_rdataset_cleanup(rdataset).
The bitset packing of the resign_lsb and heap_index in struct vecheader
was causing a race condition, since both bindrdataset and heap
operations tried to access the same byte (even though they are accessing
different fields).
While heap operations are protected by the node lock of the header being
inserted, they aren't protected by the node locks of the headers being
displaced, leading to the race condition.
This commit fixes the issue by reverting the struct packing
optimization.
This is a new seek function for dbiterator that is meant to find an
NSEC3 node in a zone database. The difference with dns_dbiterator_seek
is that if the node does not exist, this seek function will point the
iterator to the next NSEC3 name.
Add an implementation of rdataset specialized for authoritative
workloads. For now, it is a copy of rdataslab, with redundant fields
from the header removed.
The removal of the foundname and name parameters from various qp.c
functions led to formatting issues. Restore the correct formatting via
clang-format.
Outside of unit tests, the name parameter in dns_qpiter_<...> and
dns_qpchain_<...> is only used in context where the name can be
extracted directly from the underlying node.
This commits modifies the signatures of dns_qpiter_<...> and
dns_qpchain_<...> not to have a name parameter. Where the name parameter
was needed, we now query the node and copy the name directly from it.
This allows us to remove maybe_set_name from qp.c. Besides simplifying
the API, this leads to a performance speedup for NXDOMAIN handling,
as we avoid calling maybe_set_name inside step, and maybe_set_name is
very inefficient.
A copy of the implementation maybe_set_name is retained for the unit
tests.
The `foundname` parameter in dns_qp_lookup is used only in the unit
tests. This commit simplifies the API by removing it, and modifying the
unit tests to extract the name from pval.
This commit implements a batch update function for qpzone. The main
reason for this is speed: using addrdataset would cause a qp transaction
per rrdataset added, leading to a substantial slowdown compared to
RBTDB. The new API results in a qp transaction per applied diff.
This commit adds a layer of indirection to the apply_diff logic used by
IXFR and resigning by having the database updates go through a vtable.
We do this in three steps:
- We extend dns_rdatacallbacks_t vtable to allow subtraction and
resigning.
- We add a new set of api (begin|commit|abort)update to the dbmethods
vtable, that model an incremental update that can be aborted.
- We extract the core logic of diff_apply into a function that
satisfies the new interface.
- We make diff_apply use this new function, and log the results.
The intent of this commit is to allow databases to expose a batch
incremental update implementation, just like they expose a custom
batch creation implementation through (begin|end)load.
Instead of just flagging the qpcache node to be dirty, add the headers
to be cleaned to the dirty list and when cleaning the node, only walk
through the dirty node, not all the slabtops.
Document the way `__attribute__((__constructor__))` and
`__attribute__((__destructor__))` must be used in BIND9 libraries in
order to avoid unexpected behaviors with other third-party libraries.
In commit aea251f3bc, `isc_buffer_reserve()` was changed to
take a simple `isc_buffer_t *` instead of `isc_buffer_t **`.
A number of functions calling it have now been similarly
modified.
Wrap 'dns_keymgr_status()' in 'dns_zone_dnssecstatus()' so we can easily
retrieve the zone string name and refresh key time value.
In addition to the current time, output when the next key event is
expected.
Don't log keys that are completely hidden unless verbose is set.
Don't log key state values unless verbose is set, or they are in a
weird state.
For expected key states, log a more useful message of the stage of
the rollover. If we are in the middle of a key rollover, don't log
when the next key rollover is scheduled.
Condense the output for better readability.