Add a delegdb test which dump a database which contains a very long name
(using DNS master file format with escape sequence as defined per RFC
1035). This ensure that the delegdb uses large enough internal buffers
to load the names in DB and generate the dump. If this is not the case,
the test crashes on a build with address sanatizer enabled.
Fail at meson configure time if a *_test.c file exists in a test
directory but is not listed in the corresponding test array. This
prevents test files from being silently orphaned when added without
updating meson.build, as happened with diff_test.c and skr_test.c.
Assisted-by: Claude:claude-opus-4-7
Both test files existed on disk but were never added to the meson test
list when the build system switched from autoconf.
skr_test.c also had a spurious #include <dns/tls.h> for a header that
never existed in this repo -- no symbols from it were used. Removing
the include is the only fix needed; the test itself is correct and
passes.
Assisted-by: Claude:claude-opus-4-7
A resolver that validated DNSSEC accepted RSA DNSKEYs of any modulus
size up to OpenSSL's compile-time ceiling, and accepted any public
exponent the wire format could carry. RSA verification cost grows
sharply with the modulus length, so an authoritative server could
publish an oversized DNSKEY to make each signature check on the
resolver many times more expensive than for a normally sized key.
The intended verify-time cap had no effect because the helper it called
returned the public-exponent bit length rather than the modulus bit
length, so the test was always satisfied. Replace it with an honest
modulus-range check and a stricter exponent check that accepts only odd
exponents in the closed range [3, 2^32 + 1] (covering every Fermat
prime up to F5 and the odd intermediate values seen in deployed keys),
reject anything outside those bounds at every RSA key load path so an
invalid key never reaches the verifier, and keep the same checks at the
verifier as a backstop against future load paths.
Every fuzz target depended on libtest_dep, which forces building the
libbindtest shared library. In a static build (as used by OSS-Fuzz)
that link fails: libbindtest's netmgr wrappers multiply-define symbols
that also live in the static libisc/libns archives, and the static
system libraries are not position independent.
Only fuzz_dns_qp actually uses the qp test helpers, so give it just
tests/libtest/qp.c via the new libtest_qp_dep and drop libtest_dep
from the fuzzers.
Assisted-by: Claude:claude-opus-4-8
The delegation database kept one SIEVE LRU list per loop so that node
eviction could run lock-free on each node's owning loop; this required
every node to hold a loop reference and to defer its own destruction to
that loop via isc_async_run(). Move the SIEVE unlink into the QP write
transaction, taking the evicted node directly from dns_qp_deletename(),
which serialises every list mutation under the qpmulti writer lock and
lets a single shared list replace the per-loop arrays. Node and database
teardown are now synchronous.
The QP trie and the SIEVE list are wrapped in a reference-counted holder.
Each node keeps a reference to the holder so it (and its memory context)
stays valid until the node is destroyed, while shutdown drains the SIEVE
and destroys the trie from an RCU callback and frees the holder once the
last node drops its reference. Reuse across a reconfiguration now moves
ownership of the holder to the new view instead of sharing it through a
separate owners counter, so dns_delegdb_reuse() is removed.
Instead of having independent APIs to configure various aspects of the
delegdb (i.e. cache size, other settings that may come up later), a
single configuration struct is passed to `dns_delegdb_setconfig()`, which
internally does all the plumbing. To avoid relying on
atomics/synchronization, `dns_delegdb_setconfig()` must be called from
exclusive mode (for now).
The configuration can be retrieved at any time (not necessarily from
exclusive mode) using `dns_delegdb_getconfig()`. This is useful, for
instance, to flush the delegdb without losing its parameters.
After SIG and NXT lost their special handling, KEY remained the only
RFC 2535-era type still receiving coexistence allowances: KEY
alongside CNAME at the same owner, KEY answered from the parent side
of a zone cut, KEY kept across CNAME eviction in the cache. RFC 3755
retains type 25 only for SIG(0) and TKEY transaction signatures, and
neither relies on those allowances in practice. The in-tree comment
that flagged the RFC 3007 parent-side carve-out as "unclear" predicted
this cleanup.
Zones that publish CNAME and KEY at the same owner — already invalid
under RFC 2181 — now fail to load. System test fixtures are updated
accordingly, and a new test asserts that SIG, NXT, and KEY records
pick up covering RRSIGs when their zone is signed.
POSIX does not require localtime_r() to behave as if tzset() was called,
so the TZ environment change isn't picked up if some library has already
primed libc's tz cache. Loading pkcs11-provider during OpenSSL init
does exactly that, causing the time and dnstap cmocka tests to format
timestamps in UTC instead of the requested zone.
Assisted-by: Claude:claude-opus-4-7
Each address entry stored by dns_delegset_addaddr() is an
isc_netaddrlink_t, whose size depends on sizeof(void *) via the
ISC_LINK macro (24 bytes of address + two prev/next pointers): 40
bytes on 64-bit, 32 bytes on 32-bit. The hardcoded 4 MB / 8 MB
ranges only held on 64-bit, so dns_deleg_cleanuptests failed on
armv7l with isc_mem_inuse() returning ~3.2 MB.
Express the expected ranges in terms of sizeof(isc_netaddrlink_t)
so they scale with pointer width, and pull the 99999 entry count
out into a NENTRIES macro.
Assisted-by: Claude:claude-opus-4-7
Two TSIG-authenticated TKEY DELETE queries for the same dynamic key,
arriving on different worker loops, could each enter
dns_tsigkey_delete() and cause over-decrementing the key refcount.
This has been fixed by making dns_tsigkey_delete() idempotent.
Until now, the dispatcher silently dropped UDP responses from the
expected peer that carried the wrong DNS message id and kept listening
for the correct id to arrive within the read timeout. An off-path
attacker who knows the destination address and source port of an
outgoing fetch could exploit that quiet retry window to flood the
resolver with guessed responses; with a gigabit link the per-query
success probability grows linearly with the number of guesses that
arrive before the legitimate answer or the timeout.
Treat any such mismatch as a possible spoofing attempt and let the
resolver immediately retry the same query over TCP, the same control
path the truncation handler already uses.
Add a resolver statistics counter - exposed as 'queries retried over TCP
after a response with mismatched query id' in rndc stats and
'MismatchTCP' in the statistics channel
Assisted-by: Claude:claude-opus-4-7
Replace the hysteretic hi_water/lo_water switch with a stochastic
check: always false below lo_water, always true at or above hi_water,
linearly ramped probability in between. This spreads cache cleaning
across many inserts instead of triggering a thundering herd once the
hi_water mark is crossed (which causes every addrdataset to enter the
LRU purge path simultaneously and serializes lookups behind the node
write locks).
The is_overmem atomic and its stores are no longer needed and are
removed. The existing tests that asserted specific hysteretic state
transitions are simplified to check only the deterministic boundaries.
makeslab(), makevec(), dns_rdatavec_merge() and dns_rdatavec_subtract()
summed per-record storage into an unsigned int with no upper-bound
check. An RRset whose total encoded size exceeds DNS_RDATA_MAXLENGTH
cannot fit in a DNS message and is unservable; building its in-memory
representation only burns memory on data that will fail at response
time, and at the upper bound the running sum could in theory wrap.
Cap the running total at DNS_RDATA_MAXLENGTH and return ISC_R_NOSPACE
when exceeded. Update the qpdb cache memory-purge test to use a
record size that fits within the new limit.
Assisted-by: Claude:claude-opus-4-7
RFC 3445 also eliminated the DNS_KEYTYPE_NOAUTH, DNS_KEYTYPE_NOCONF,
and DNS_KEYOWNER_ENTITY flags. With NOAUTH and NOCONF gone, the
concept of NOKEY can no longer be expressed in KEY records.
DNS_KEYOWNER_ENTITY was already unused as of 22d688f656 but still
defined; that is now also removed.
The wire-format RSA DNSKEY parser used the residual rdata length after
the exponent as the modulus length, with no positive lower bound. A
crafted DNSKEY whose declared exponent length consumed the whole buffer
produced n = 0; the BN_bin2bn(_, 0, _) returned a non-NULL BIGNUM, the
NULL-check passed, and dnssec-importkey -f wrote out a "valid" key with
no key material. RSASHA1 also bypassed the algorithm-specific lower
bound in opensslrsa_createctx (which only checks an upper bound for the
SHA1 algorithms), so the degenerate key reached the verify path with
whatever behaviour the linked OpenSSL exhibits for n = 0.
Add OPENSSLRSA_MIN_MODULUS_BITS = 512 (the lowest legitimate modulus
across the RSA DNSSEC algorithms per RFC 5702) and reject smaller
moduli at parse time in opensslrsa_fromdns, opensslrsa_parse, and
opensslrsa_fromlabel — the same three load paths where the existing
exponent upper-bound check lives.
Assisted-by: Claude:claude-opus-4-7
The wire-format RSA DNSKEY parser was the only key path with no upper
bound on the public exponent — opensslrsa_parse and opensslrsa_fromlabel
already cap at RSA_MAX_PUBEXP_BITS. An attacker-controlled DNSKEY could
therefore force a validator to compute s^e mod n with e up to ~|n| bits,
amplifying every verify by ~120x for typical 2048-bit moduli (OpenSSL
itself only caps the exponent for moduli above 3072 bits). Apply the
same bit-count cap to wire-format keys.
Assisted-by: Claude:claude-opus-4-7
The function existence-checked the target with stat() and then opened
the same path without O_NOFOLLOW, so a symlink at the target path
passed the regular-file test against the link's destination and the
open() that followed truncated and wrote through the link.
rndc-confgen -a is typically run as root and writes the keyfile under
a directory that service accounts may have write access to, so a stray
symlink there would silently redirect the truncate, fchown, and
overwrite to whatever file the link pointed at.
Switch the existence check to lstat() and use S_ISREG() so a symlink's
S_IFLNK mode is detected directly (a plain bitmask of S_IFREG matches
both, since S_IFLNK shares its high bit). Add O_NOFOLLOW to both
open() flag sets to close the lstat/open TOCTOU window. Hardening
against unexpected symlinks on intermediate path components is out of
scope.
Assisted-by: Claude:claude-opus-4-7
Instrument the delegation cache (introduced to back both NS-based and
DELEG-based delegations) with 11 USDT probes in the libdns provider so
that hit rate, eviction pressure, and lookup latency can be measured
without recompiling or enabling logging.
The probes are:
- delegdb_lookup_start / delegdb_lookup_done wrap dns_delegdb_lookup()
and pass the query name plus the result code.
- delegdb_insert_start / delegdb_insert_done wrap dns_delegset_insert().
The early SHUTTINGDOWN return is funneled through the cleanup label
so the done probe fires on every path.
- delegdb_cleanup_start / delegdb_cleanup_done bracket the SIEVE-based
eviction triggered when the cache goes overmem, reporting the number
of bytes requested and actually reclaimed. An additional per-node
delegdb_evict probe (guarded by _ENABLED() because it fires inside
the loop) exposes which zones are being evicted.
- delegdb_create, delegdb_reuse, and delegdb_shutdown trace the per-view
lifecycle across server reloads.
- delegdb_delete traces rndc flush-delegation paths, reporting whether
a subtree or single name was removed.
Name arguments are stringified with dns_name_format() behind
LIBDNS_*_ENABLED() guards so that the hot lookup and insert paths remain
zero-cost when no consumer is attached.
resign_sooner_values() only checked whether rhs was SOA-typed when
resign times were equal, but did not check lhs. When both entries were
SOA-typed with equal resign times, the comparison returned true in both
directions, violating irreflexivity and corrupting heap invariants.
Add lhs_typepair parameter and require lhs to be non-SOA for the
tie-breaking logic to apply.
The `DNS_DBFIND_NOEXACT` flag name is ambiguous, as it does not clearly
indicate the lookup behavior (e.g., sibling, child, or parent).
Rename it to `DNS_DBFIND_ABOVE` to better reflect that the lookup
targets a closer ancestor name.
dns_delegset_fromnsrdataset() used isc_g_mctx for the transient
delegset it builds from a DNS NS rdataset. That hides delegation
data in the global default context instead of accounting it against
the subsystem that owns it: a resolver fctx, a view, or a query
context.
Take an explicit mctx parameter so callers can direct the allocation
to the right place, and update the three call sites:
- lib/dns/view.c:1189 (dns_view_bestzonecut fallback) uses view->mctx
- lib/dns/resolver.c:7071 (resume_dslookup) uses fctx->mctx
- lib/ns/query.c:8672 (query_delegation_recurse) uses the client
manager's mctx
Also tighten delegdb cleanup to run inside the same write transaction
as the insert: delegdb_node_prepare() now returns the size of the new
node, and delegdb_cleanup() takes the caller's open qp so that the
overmem reclamation and the insert share one commit instead of doing
two nested write transactions.
dns__deleg_lookup() with DNS_DBFIND_NOEXACT is supposed to return
the deepest proper ancestor of the lookup name. It called
getparentnode() to step up from an exact match, but getparentnode()
only iterated while the chain length was >= 2. When the chain
contained a single entry (the exact match itself with no ancestor
stored in the trie), the loop did not execute and left the caller
looking at the exact match. The subsequent isactive() check then
returned success and the function reported the exact match as the
"deepest ancestor", violating NOEXACT semantics.
This was observable as the resolver picking the child-side
delegation for an at-parent type (e.g. a DS query for a TLD), then
sending the query to the child's own nameservers and recovering via
the "chase DS servers" path.
Have getparentnode() set '*node' to NULL when it cannot find an
active proper ancestor, and make dns__deleg_lookup() NULL-check
before returning, matching the canonical NOEXACT implementation in
dns_zt_find(). Update the deleg unit test to expect NOTFOUND for
the top-level-no-parent case.
This adds a new API call dns_zone_expandzonefie(), which will enable
named-checkconf to expand filenames the same way the server does in
dns_zone_setfile().
Previously, the user of dns_dispatch API had to first call
dns_dispatch_gettcp() and if that failed create a new TCP dispatch with
dns_dispatch_createtcp(). This has been changed and the TCP connection
reuse happens transparently inside dns_dispatch_createtcp(). There are
separate buckets for dns_resolver, dns_request and dns_xfrin units, so
these don't get mixed together.
In order to make zone.c more readable, we are splitting it up in
separate source files. This moves the zonemgr to its own file
("zonemgr.c").
Since this code accesses the zone structure directly, move the
'struct dns_zonemgr' and its prerequisites to "zone_p.h".
The helper functions 'forward_cancel()', 'zone_xfrdone()',
'zmgr_start_xfrin_ifquota()', and 'zmgr_resume_xfrs() need to be
internally accessible to both source files.
Note: This commit does not compile.
Replicating CI failures requires the developer to piece together the
sanitizer flags by hand, reducing ergonomics.
Fix this problem by embedding the relevant settings to the executables.
Symbol resolution still needs manual intervention by setting the env
variable `*SAN_SYMBOLIZER_PATH`. However, this doesn't affect any behavior.
Replace the separate pointer+length field pairs in dns_rdata_nsec3_t
(salt/salt_length, next/next_length, typebits/len) and
dns_rdata_nsec3param_t (salt/salt_length) with isc_region_t. This
makes the structs self-describing and eliminates a class of
length-mismatch bugs.
The dns_zone_setnsec3param() signature is updated to take
isc_region_t *salt instead of separate saltlen and salt arguments.
Function signatures for dns_nsec3_addnsec3, dns_db_getnsec3parameters,
and related internal functions still use separate pointer+length pairs
and should be updated in a follow-up.
In order to make zone.c more readable, we are splitting it up in
separate source files. This moves the set and get functions to its
own file ("zoneproperties.c").
Since this code accesses the zone structure directly, move the
'struct dns_zone' and its prerequisites to "zone_p.h".
The helper functions 'inline_raw()', 'inline_secure()',
'dns_zone_setview_helper()', 'zone_settimer(), 'set_resigntime()', and
'zone_freedbargs()' need to be internally accessible to both source
files.
A few set/get functions remain in zone.c for now:
- dns_zone_getserial
- dns_zone_getversion
- dns_zone_setviewcommit
- dns_zone_setviewrevert
- dns_zone_get_rpz_num
- dns_zone_set_parentcatz
- dns_zone_get_parentcatz
- dns_zone_setrawdata
- dns_zone_setskr
- dns_zone_getskrbundle
- dns_zone_setnsec3param
- dns_zone_setoption
- dns_zone_getoptions
- dns_zone_getrequesttransporttype
- dns_zone_getredirecttype
- dns__zone_getnotifyctx
- dns_zone_getgluecachestats
- dns_zone_setplugins
- dns_zone_setserial
- dns_zone_getxfr
- dns_zone_getkeystores
Now that we track the references at the vecheader level, binding an
rdataset is no longer guaranteed to keep its node alive. Therefore
remove the node pointer from the rdataset, and instead decide whether
glue is required by explicitely passing the owner name to addglue.
This commit adds a level of indirection to the signing operations.
Instead of being intrusive, the qpz_heap will keep track of which
headers must be resigned through a hashmap.
The intent is to make dns_vecheader_t entirely self-contained. In
particular, the ownership structure between the heap and the headers is
flipped. Before, the headers would "own" the heap, now the heap owns
the header.
Add `dns_delegdb_t`, a qpmulti-based database enabling to lookup a
delegation set (`dns_delegset_t`) from a zonecut name (`dns_name_t`). A
delegation set object essentially contains an expiration time and a list
of delegation (`dns_deleg_t`). Finally, a delegation can be either:
- A list of IP addresses (`isc_netaddrlist_t`), for NS-based delegation
providing glues or DELEG-based delegation using `server-ipv4=` or
`server-ipv6=`;
- Or a list of nameserver names, for NS-based delegation without glues,
or DELEG-based delegation using `server-name=`;
- Or a list of nameserver names, for DELEG-based delegation using
`include-delegparam=`.
The delegation database API provides lookup by closest zonecut,
delegation and delegation set builders as well as insertion of those
newly built delegation set, dumping to a `FILE *`, conversion from an NS
rdataset to a delegation set, deletion of a specific zonecut or all the
sub-tree of a given zonecut.
A memory context is internally used inside the delegation database and
can be constraint to a maximum size. Once it gets close to its maximum
size and a new delegation set is inserted into the database, a
reclamation flow is run internally removing the least recently used
entries.
The delegation set and delegation objects are, once they been inserted
into the database, read-only object. Thus, the caller can use them
without concurrency or locking concerns, and must detached them once its
done with it.
Since it is impossible to increase an isc_statsmulti counter and
retrieve the new counter atomically, and we need the output of
recursclients in order to compute ns_highwater_recursive, we change the
recursclients counter to an isc_stats one.
In the current statistics counter implementation, the statistics are
backed by an array of counters, which are updated via atomic operations.
This leads to contention, especially on high core count
machines.
This commit introduces a new isc_statsmulti_t counter that keeps a
separate array per thread. These counters are then aggregated only when
statistics are queried, shifting work off the critical path.
These changes lead to a ~2% improvement in perflab.
Add a REQUIRE(isc_loop() == loop) assertion to isc_work_enqueue()
to strictly enforce that work is enqueued from the loop it is
assigned to. This loudly prohibits cross-thread queue manipulation
before it inevitably turns into a concurrency debugging nightmare.
Adds text and wire format unit tests to verify the newly enforced
maximum NSEC3 hash length constraints. These tests ensure that hash
lengths up to the 39-byte maximum are accepted, while larger sizes
correctly fail.
NSEC3 hashes are required to fit within a single DNS label. Since there
are 5 bits per label byte without pad characters, the maximum hash size
is floor(63*5/8) (39 bytes).
This patch enforces this maximum length for unknown algorithms, while
strictly enforcing the exact expected digest length for known algorithms
like SHA-1.
Add two short records to example.com.db that cause assertion failures
when converted to wire form.
The checks added to tests.sh are technically not required: the relevant
assertion failures are already hit when the zone is transferred out of
ns1.
Update the relevant unit tests with 1-byte records.
Co-authored-by: Mark Andrews <marka@isc.org>
Instead of the `EVP_MD_CTX` based functions, use either the new
`EVP_MAC` or the old `HMAC_CTX` based functions.
`EVP_MAC` is the recommended way using using MAC functions in post-3.0
while `HMAC_CTX` is used internally by `EVP_MD_CTX`, making the latter
redundant.
Get rid of the OpenSSL-isms that plague the codebase where the hash type
is `EVP_MD *`
By using a proper enum, alongside the cleanup, we also get the ability
to use constants for known hash sizes instead of having a function call
every time.
`EVP_MD_CTX_get0_md` has been removed instead of being adapted since it
wasn't used anymore.
These functions don't need to be called from multiple places and
by making them static we will detect when they are not added to the
list functions to be tested.