Flushing the name when NTA expires causes problems for the ongoing
resolving process. Do not flush the name from the cache. Instead,
the resolver should do the flushing (this is planned to be merged
next).
After KeyTrap, the temporal DNSSEC were originally hard errors that
caused validation failures even if the records had another valid
signature. This has been changed and the RRSIGs outside of the
inception and expiration time are not counted as hard errors. However,
these errors are not even counted as validation attempts, so excessive
number of expired RRSIGs would cause some non-cryptograhic extra work
for the validator. This has been fixed and the temporal errors are
correctly counted as validation attempts.
The prescan and main update loops in DNS UPDATE processing both used the
same counter to index the maxbytype[] quota array. The prescan loop
always incremented the counter, but the main loop had 14 continue paths
that skipped the increment. This allowed an authenticated DDNS client to
craft an UPDATE message with padding records (e.g. CNAME+A pairs that
trigger CNAME-conflict skips) to shift the counter and read wrong quota
entries, bypassing per-type record limits entirely.
Fix by incrementing the counter unconditionally at the start of each
iteration in the main loop.
Allow empty APL records because RFC 3123 (Section 4) says "zero or
more items". This fixes processing of a catalog zone ACL (which is
based on APL records) when the zone contains an empty APL record or
when a zone update arrives which creates an empty APL record.
Since it is impossible to increase an isc_statsmulti counter and
retrieve the new counter atomically, and we need the output of
recursclients in order to compute ns_highwater_recursive, we change the
recursclients counter to an isc_stats one.
In the current statistics counter implementation, the statistics are
backed by an array of counters, which are updated via atomic operations.
This leads to contention, especially on high core count
machines.
This commit introduces a new isc_statsmulti_t counter that keeps a
separate array per thread. These counters are then aggregated only when
statistics are queried, shifting work off the critical path.
These changes lead to a ~2% improvement in perflab.
This adds the switch +[no]cookie to delv to control the sending of
DNS COOKIE options when sending requests. The default is to send
DNS COOKIE options.
previously, rpz_rrset_find() behaved differently depending on whether
a cache lookup returned DNS_R_DELEGATION or ISC_R_NOTFOUND. the former
indicates the presence of a cached NS rrset, and the latter indicates
that the cache is cold or that all NS rrsets above the query name have
expired. both results indicate that the caller should recurse, but
rpz_rrset_find() only recursed in the case of DNS_R_DELEGATION.
the nsip-wait-recurse and nsdname-wait-recurse test cases in the
rpzrecurse system test were dependent on this misbehavior. the test
server was configured with a lame delegation, so that recursion always
failed, but once the lame delegation was expired due to a zero TTL, the
cache returned ISC_R_NOTFOUND, which caused the recursion not to be
attempted. the test seemed to be observing a delay before recursion
succeeded, but it was actually observing a delay before recursion was
skipped. fixing this bug caused the test to fail.
the test server has now been reconfigured so that recursion succeeds
after a delay, instead of failing. now we're able to test that
we're waiting for the successful completion of recursion.
Race rndc reconfig (toggling between allow-update and update-policy)
against a stream of DNS UPDATEs for 5 seconds and verify that named
does not crash.
Before the fix, the race between send_update() and update_action()
reading the SSU table independently could trigger an assertion
failure (INSIST) when the zone's update policy changed between the
two reads.
Pass the SSU table through the update event struct from
send_update() to update_action() instead of reading it from the
zone twice. If rndc reconfig changed the zone's update policy
between the two reads (e.g., from allow-update to update-policy),
send_update() would skip the maxbytype allocation but
update_action() would see a non-NULL ssutable, triggering
INSIST(ssutable == NULL || maxbytype != NULL) and crashing named.
The ssutable reference is now taken once in send_update() and
transferred to update_action() via the event struct, ensuring
both functions see the same value.
Replace the local SAVE(), RESTORE(), and INITANDSAVE() macros in
query.c with the project-wide MOVE_OWNERSHIP() macro. The new
form is clearer about the intent: ownership of a pointer is being
transferred from source to destination, with the source set to NULL.
SAVE and RESTORE were identical macros with different names used to
indicate the direction of transfer, but this distinction was purely
cosmetic. INITANDSAVE additionally set the destination to NULL
first, which is unnecessary because the preceding memcpy already
initialized all fields from the source struct.
A helper macro that returns the current value of a pointer and sets
it to NULL in one expression, useful for transferring ownership in
designated initializers.
dns_view_flushnode() was called in the delete_expired() async
callback, which runs after the query that detected the NTA expiry.
This created a race: the query would proceed with stale cached data
from the NTA period before the flush had a chance to run, resulting
in transient SERVFAIL with EDE 22 (No Reachable Authority).
Move dns_view_flushnode() into dns_ntatable_covered() so the cache
is flushed synchronously when the expiry is detected, before the
query continues.
Also simplify the expiry comparison in delete_expired() to a direct
pointer comparison (nta == pval) instead of comparing expiry
timestamps.
The SRTT update loaded the old value, computed a new one, and stored it
back as separate operations. Two concurrent callers could each read the
same old value and one update would be silently lost.
Use a CAS loop for the read-modify-write on entry->srtt. For the aging
path, also CAS on entry->lastage to prevent multiple threads from aging
the same entry in the same second.
Move the write to fctx->vresult after LOCK(&fctx->lock). The field was
being set before acquiring the lock, but dns_resolver_logfetch() reads
it under the same lock from another thread.
The liburcu rcu_cmpxchg_pointer() uses CMM_RELAXED ordering on the CAS
failure path. When a thread loses the CAS and gets another thread's
pointer back, reading fields through that pointer is a data race on
weakly-ordered architectures (ARM, POWER) because the failing load has
no acquire semantics.
Override rcu_cmpxchg_pointer() and rcu_xchg_pointer() to use standard
__atomic builtins with __ATOMIC_ACQ_REL (success) and __ATOMIC_ACQUIRE
(failure) ordering. This fixes the race on all architectures and is
natively visible to ThreadSanitizer.
Use CMM_LOAD_SHARED/CMM_STORE_SHARED for nta->expiry, which is
written from the NTA's owning loop but read from any loop (validator,
rndc status, rndc nta -dump).
Also dispatch delete_expired to the NTA's owning loop rather than
the caller's loop.
Replace the ntatable rwlock with RCU read-side critical sections.
The QP multi trie already provides its own concurrency control for
reads and writes, making the rwlock redundant. NTA fields like
expiry are only accessed from the NTA's own event loop thread, so
no additional synchronization is needed.
The table shutdown is now deferred via call_rcu to ensure all
read-side critical sections have completed before iterating and
shutting down individual NTAs.
The 'env' pointer is passed to an async function without taking
a reference first, which can potentially cause a use-after-free
error. Take a reference, then detach in the async function.
isc_buffer_init() is given MAX_DNS_MESSAGE_SIZE (65535) as capacity but
only h2->content_length bytes are allocated. This makes the buffer
believe it has more space than actually allocated. A secondary bounds
check (new_bufsize <= h2->content_length) prevents actual overflow, but
the buffer invariant is violated.
Pass h2->content_length as the capacity to match the allocation.
Drop the NZF (New Zone File) fallback for persisting runtime zone
configurations, making LMDB (NZD) the only storage backend. This
removes all #ifdef HAVE_LMDB conditionals, the meson 'lmdb' option,
and the NZF-related functions. LMDB is now a mandatory build
dependency.
The named-nzd2nzf tool is now always built.
The error cleanup in fctx_create() was missing isc_mutex_destroy() and
dns_ede_invalidate() calls. When error paths (cleanup_nameservers,
cleanup_fcount, cleanup_qmessage, cleanup_adb) were taken after the
mutex and edectx were initialized, the fctx memory was freed without
properly destroying these resources first.
The previous code was incorrectly clearing errno after calling
strtol but before testing the result rather than clearing it and
then calling strtol so that changes to errno can be correctly
determined.
We return DNS_R_NOVALIDSIG if we detected a deadlock. Then in
'validate_async_done()', this result value is used to check if we
need to fall back to insecure. As part of that we create a new fetch
but that fails because of the detected deadlock. This results in a loop
of deadlock detected, fallback to insecure, deadlock detected, ...
Add a new result value, ISC_R_DEADLOCK, and return this instead when
we have detected a deadlock. This will be treated as a generic error,
as there is no special handling for this result value.
The struct dns_ixfr was defined as part of struct dns_xfrin, probably
because at some point it was an anonymous struct and then it was changed
to named struct with typedef at the top. Move the definition from
struct dns_xfrin into and fold into the typedef ... dns_ixfr_t.
After merging the NORMAL, NSEC and NSEC3 tree into single QP tree, there
were some comments still speaking about auxiliary NSEC tree. These were
cleaned up and the logic when we pass the qp tree (write transaction) to
qpzone_addrdataset_inner() was changed to be more obvious that this is
needed only when we are adding NSEC records.
Move the LIBDNS_XFRIN_RECV_DONE probe execution before dns_xfrin_detach
in xfrin_recv_done.
Previously, dns_xfrin_detach was called before the trace probe, which
could free the xfr object. Because the accessed member xfr->info is an
embedded array, the expression evaluates via pointer arithmetic rather
than a direct memory dereference. Although this prevents a reliable
crash in practice, it technically remains a use-after-free issue.
Reorder the statements to ensure the transfer context is fully valid
when the probe executes.
Starting from OpenSSL 4 the the X509_get_subject_name() function
returns a 'const' pointer to a name instead of a regular pointer.
Duplicate the name before operating on it, then free it.
Since memory allocation never fails in BIND 9, checkds_create() cannot
fail. Change it to return void and use designated initializers,
removing error handling at all call sites.
Two 'goto next' paths in zone_notify() skipped detaching the TSIG
key and transport, leaking them on TLS configuration failure and
when the destination address is disabled.
When the dns_qp_getname() call returns an error the del_name() function
just returns without cleaning up the trasnaction.
Instead of returning, jump to a new label 'done:' similar to the code
written in the add_nm() function.
The INSIST in isc_radix_insert() checks node->data[RADIX_V4] and
node->node_num[RADIX_V4] twice due to a copy-paste error, never
verifying the RADIX_V6 fields.
Fix the second pair to check RADIX_V6.
A few port validation checks use >= UINT16_MAX instead of > UINT16_MAX,
incorrectly rejecting port 65535 as out of range. Port 65535 is a valid
TCP/UDP port number. Other port checks in the same file already use the
correct > comparison.
When defaults->zonedir is set, opts->zonedir is unconditionally
overwritten without freeing the previous value. This leaks memory
on every catalog zone update when zonedir defaults are configured.
Free the existing opts->zonedir before replacing it.
Refactor dns_loadctx_t and dns_dumpctx_t to use standard
ISC_REFCOUNT_DECL and ISC_REFCOUNT_IMPL macros, retiring the
redundant manual attach and detach implementations.
Introduce dns_loadctx_enqueue() and dns_dumpctx_enqueue() to
ensure compliance with the new strict loop affinity in
isc_work_enqueue(). If the current loop does not match the
target loop, the enqueue operation is safely bounced to the
correct thread via isc_async_run().
Add a REQUIRE(isc_loop() == loop) assertion to isc_work_enqueue()
to strictly enforce that work is enqueued from the loop it is
assigned to. This loudly prohibits cross-thread queue manipulation
before it inevitably turns into a concurrency debugging nightmare.
The 'keyname' variable could be used in the add_rdata_to_list()
call without being initialized. Make sure that 'keyname' is non-NULL
for all the cases that do not jump to the 'cleanup:' label.
The attacker that controls DNSSEC-signed zone can trigger a memory leak
in the addnoqname() and/or addclosest() by creating more than
max-records-per-type RRSIG for any NSEC records. The memory leaks have
been fixed.