Commit graph

1147 commits

Author SHA1 Message Date
Alessio Podda
ed0ecb62e4 Add low contention stats counter
In the current statistics counter implementation, the statistics are
backed by an array of counters, which are updated via atomic operations.
This leads to contention, especially on high core count
machines.

This commit introduces a new isc_statsmulti_t counter that keeps a
separate array per thread. These counters are then aggregated only when
statistics are queried, shifting work off the critical path.

These changes lead to a ~2% improvement in perflab.
2026-03-26 10:19:25 +01:00
Ondřej Surý
a2bd833909 Fix data race on fctx->vresult in validated()
Move the write to fctx->vresult after LOCK(&fctx->lock).  The field was
being set before acquiring the lock, but dns_resolver_logfetch() reads
it under the same lock from another thread.
2026-03-20 00:56:19 +01:00
Ondřej Surý
5b1750f15f Fix missing mutex destroy and ede invalidate on fctx_create() error paths
The error cleanup in fctx_create() was missing isc_mutex_destroy() and
dns_ede_invalidate() calls. When error paths (cleanup_nameservers,
cleanup_fcount, cleanup_qmessage, cleanup_adb) were taken after the
mutex and edectx were initialized, the fctx memory was freed without
properly destroying these resources first.
2026-03-17 16:05:11 +01:00
Ondřej Surý
2da669490c Fix resquery reference imbalance on TCP connect failure
In fctx_query(), resquery_ref(query) is called before
dns_dispatch_connect() in anticipation of the resquery_connected()
callback consuming the reference.

When dns_dispatch_connect() fails synchronously on TCP (e.g. from
dns_transport_get_tlsctx() failing in tcp_dispatch_connect()), the
connect callback is never scheduled, so the extra reference is never
consumed.  The error path then tears down the query via manual cleanup
(isc_mem_put) without going through the refcount destructor, leaving
the reference imbalanced.

Fix by dropping the extra reference on the error path, just after
dns_dispatch_done() which cleans up the dispatch entry.
2026-03-10 17:58:43 +01:00
Ondřej Surý
c1ba80169c
Introduce max-delegation-servers configuration option
Make the maximum number of processed delegation nameservers configurable
via the new 'max-delegation-servers' option (default: 13), replacing the
hardcoded NS_PROCESSING_LIMIT (20).

The default is reduced to 13 to precisely match the maximum number of
root servers that can fit into a classic 512-byte UDP payload.  This
provides a natural, historically sound cap that mitigates resource
exhaustion and amplification attacks from artificially inflated or
misconfigured delegations.

The configuration option is strictly bounded between 1 and 100 to ensure
resolver stability.
2026-03-04 16:13:49 +01:00
Aram Sargsyan
e41fbea843 Replace the outgoing queries RTT histogram code with isc_histomulti
The granularity of the simple histogram with fixed number of ranges
sometimes isn't good enough. As there's a need to implement a new
histogram statistics for the incoming query times (RTT), it was decided
to also update the existing RTT statistics of the outgoing queries
so that they look similar and use common code.

Remove the old histogram code from the resolver and from the statistics
channel. Reimplement the outgoing queries RTT histogram using the
isc_histomulti module, and prepare the necessary base for implementing
the incoming queries RTT histogram. The statistics channel will be
updated to expose the new histograms in an upcoming commit.
2026-02-26 14:00:10 +00:00
Ondřej Surý
3c33e7d937
Implement Fisher-Yates shuffle for nameserver selection
Replace the two-pass "random start index and wrap around" logic in
fctx_getaddresses_nameservers() with a statistically sound Fisher-Yates
shuffle.

The previous implementation picked a random starting node and did two
passes over the linked list to find query candidates.  The new logic
extracts the available nameservers into a bounded, stack-allocated array
of dns_rdata_t structures.

This array is then randomized in-place using a Fisher-Yates shuffle.
Finally, the shuffled array is traversed sequentially to launch fetches
until the dynamic quota (fctx->pending_running >= fetches_allowed) is
reached.

This guarantees a fair random distribution for outbound queries while
properly respecting dynamic query limits, entirely within O(1) memory
and without the overhead of linked-list pointer shuffling or multiple
dataset traversals.
2026-02-26 06:57:53 +01:00
Mark Andrews
b78052119a Remove determinist selection of nameserver
When selecting nameserver addresses to be looked up we where
always selecting them in dnssec name order from the start of
the nameserver rrset.  This could lead to resolution failure
despite there being address that could be resolved for the
other names.  Use a random starting point when selecting which
names to lookup.
2026-02-25 09:27:03 +01:00
Mark Andrews
38b626d58d Correctly identify forwarded queries with DNSTAP
Queries using forwarders where not being correctly identified
when using dnstap.
2026-02-17 13:17:43 +11:00
Colin Vidal
f623ab1fb3 fetch loop detection improvements
The fetch loop detection occured in two places: when
`dns_resolver_createfetch()` is invoked (looking up through the parent
fetches chain and stops the fetch if a parent fetch is the same qname and
qtype) and right after calling `dns_adb_findname()` in the resolver
(stops the fetch if the current fetch is the same name from the ADB
lookup, and ADB lookup needs to fetch it).

Regarding fetch loop detection at the `dns_resulver_createfetch()`
entry, there are case where both qname and qtype are similar but the
zonecut is different. This will then query different name servers and
get different responses. For instance, the following delegation
parent-side (both for `foo.example.` and `dnshost.example.`):

	foo.example.		3600	NS	ns.dnshost.example.
	dnshost.example.	3600	NS	ns.dnshost.example.
	ns.dnshost.example.	3600	A	1.2.3.4

Then the child-side of `dnshost.example.`:

	dnshost.example.	300	NS	ns.dnshost.example.
	ns.dnshost.example.	300	A	1.2.3.4

Then the child-side of `foo.example.`:

	foo.example		3600	NS	ns.dnshost.example.
	a.foo.example		300	A	5.6.7.8

Obviously, there is a misconfiguration between the parent-side and the
child-side of `dnshost.example` (the mismatch of the TTL), but, this
happens...

Because the resolver is currently child-centric, the parent-side
delegation's glue of `dnshost.example.` will be overriden by the
child-side of the delegation. Once both A records will expires, the
resolver will attempt to find out the A RRs but will start from the
`foo.example.` zonecut, as the delegation itself is still valid.

Then the resolver will attempt to resolve `ns.dnshost.example.`, still
using the `foo.example.` zonecut, which will immediately trigger another
attempt to resolve `ns.foo.example.` (because the A RR is expired). This
is, however _not_ a loop, because the second attempt will have
`dnshost.example.` zonecut.  And this changes everything, because the
resolver detects the A name is in-domain, and pass a flag to ADB so
`dns_view_find()` won't use the cache. As a result, the zonecut will be
`.`, and the hints (root servers) will be queried instead.

From that point, they'll return the parent-side delegation, which
includes the glue for `ns.dnshost.example/A`, and the resolution can
continue. Previously, this wouldn't be possible because a loop would be
detected from the second attempt to looking `ns.foo.example/A` and would
result in a SERVFAIL.

Now, the loop detection is relaxed as the loop is detected if the qname,
qtype _and_ zonecut are equals.

This commit also changes the way the loop detection post
`dns_adb_createfind()` works. From the same example above, there would
be two ADB fetches with the same name, but with two different ADB flags
(the first one without DNS_ADB_STARTATZONE, the second one with that
flag). It means that there will be two fetches out of those two ADB
lookups, both legit, and not a loop (i.e. it won't be stuck). To
differenciate between a find which has a pending fetch (which could be
from another find the current find has been attached to), a new find
option `DNS_ADBFIND_STARTEDFETCH` is introduced, which tells that the
current has did started a fetch.

That way, if a find doesn't have `DNS_ADBFIND_STARTEDFETCH` option but
has pending fetches, we know this is a find attached to a similar find
so this is a loop. Otherwise, with `DNS_ADBFIND_STARTEDFETCH`, we know
that even if there is a pending fetch, this is not a loop as the fetch
has just been started
2026-02-11 14:07:19 +01:00
Colin Vidal
e62cafd3c7 rename fetch response db field to cache
As the `dns_fetchresponse_t` `db` field can only be attached to the
resolver cache database, rename it into `cache` to avoid ambiguities.
2026-02-10 08:50:16 +01:00
Evan Hunt
feed0fb43c use a union for resp and qmin data
It's potentially confusing to use "resp_rdataset" for QNAME
minimization, but we can make it a union and have resp.rdataset
and qmin.rdataset using the same memory.

We can save even more space by using the same union to combine
qminname and resp_foundname and access them as qmin.name and
resp.foundname.
2026-02-10 08:50:16 +01:00
Colin Vidal
fd526c0ad0 resolver: remove qminrrset, qminsigrrset from fctx
Two rdataset property `qminrrset` and `qminsigrrset` are removed from
the fetch context. They only are used as temporary storage for the query
result of the qmin query, and are immediately detached from
`resume_qmin` once the query is over.

As an alternative, use `resp_rdataset` and `resp_sigrdataset`
instead; those are not needed for storing the response data until
after qmin_resume() is over.
2026-02-10 08:50:16 +01:00
Colin Vidal
5972ee2cd5 resolver: copy fetch responses and send events in one go
Instead of first copying query response data into each fetch response
and then iterating again to send the response to the caller, perform
both operations in one go.

Also removed some duplicate code.
2026-02-10 08:50:16 +01:00
Colin Vidal
a5b2a8c931 resolver: simplify fetch response handling
There is no longer a need to decide whether a fetch response should be
prepended or appended to the fetch response list. As query response data
is stored directly in the fetch context object, responses containing a
sigrdataset no longer need to be ordered first. Remove the code
implementing this logic.

Additionally, the distinction between `fetchstate_done` and
`fetchstate_sendevents` is no longer needed. New clients
`dns_fetchresponse_t` can be attached any time to the fetch context
until `fctx__done()` is called, since there is no dependency on the
first fetch response in the list. This simplifies the code and reduces
(just a bit) locking usage.
2026-02-10 08:50:16 +01:00
Colin Vidal
b764d43203 resolver: temporarily store query answer in fetch context
Query answers are now stored in dedicated fetch context properties,
instead of using `ISC_LIST_HEAD(fctx->resps)`.

This reduces lock critical section usage in some places, and enables
further simplifications. (In particular, it removes the need for special
logic to prepend a fetch response to the list when it contains a
sigrdataset.)
2026-02-10 08:50:16 +01:00
Colin Vidal
74a74b5f29 resolver: Defer cloning of fetch responses until events are sent
Instead of cloning fetch responses immediately after writing to the
head of the fetch response list, defer cloning until the events are
actually sent.

This removes the need for the `fctx->cloned` state. However, a new
fetch state value, fetchstate_sentevents, is introduced and occurs
after fetchstate_done.  To prevent new fetch responses from being
prepended after the head is written but before cloning occurs,
fetchstate_done is now set at all call sites that previously invoked
`clone_results()`.
2026-02-10 08:50:16 +01:00
Colin Vidal
bc1a66a1d0 resolver: add comment when recursing
When a fetch result gets a delegation, `rctx_referral()` sets the
`rctx->get_nameserver = true`, which tells the resolver to retry another
server, not because of an error with the current server, but simply to
follow the delegation.

Update the comment of `rctx_nextserver()` which is quite confusing here
(as it's not immediately obvious from the code how we recurse when
getting a delegation back from a query).

Also add a log line, which helps figuring out this is happening.
2026-01-22 07:31:00 +01:00
Colin Vidal
6e63d5d02a fix resolver query response doc
In case on positive response, the `rctx_authority_positive()` function
is called to scan the AUTHORITY section to find NS servers and related
RR (glues) to be cached.

The doc says the function was called `rctx_authority_scan()`, but it is
called `rctx_authority_positive()`.
2026-01-22 07:31:00 +01:00
Colin Vidal
e8b0d4749c rename dns_view_findzonecut() into dns_view_bestzonecut()
`dns_view_findzonecut()` is used only in the context where the closest
name servers for a name need to be queried.  In the future, this API
will also return the glues (if known) for those name servers, as well
as (exclusively, if both NS and DELEG exist) the DELEG record.

To avoid ambiguities with other code flows using `dns_db_findzonecut()`,
`dns_view_findzonecut()` has been renamed into `dns_view_bestzonecut()`.
2026-01-16 07:52:56 +01:00
Colin Vidal
18d6b94c1f remove sigrdataset from dns_view_findzonecut()
Since the `sigrdataset` "output" parameter of `dns_view_findzonecut()`
is never used (always called with NULL), it is now removed.

Also, since the resolver is moving towards a parent-centric direction,
there is no point having a signature for the NS record (which is not
authoritative in the parent, so never signed) in the contextes where
`dns_view_findzonecut()` is called.
2026-01-15 19:48:30 -08:00
Colin Vidal
e0d7bddc6c simplify usage of dns_view_findzonecut()
As `dns_view_findzonecut()` only returns either ISC_R_SUCCESS or
DNS_R_NXDOMAIN, and since it automatically disassociates the rdatasets
in case of failure, some call sites are simplified.
2026-01-08 20:26:32 +01:00
Ondřej Surý
bd074ff0ea
Cleanup the extra dns_rdataset_disassociate() code
Manually go through the code using dns_rdataset_isassociated() and
use dns_rdataset_cleanup() where appropriate in places that a simple
semantic patch is not able to find automatically.
2025-12-17 15:19:55 +01:00
Ondřej Surý
8320faf64b
Apply the dns_rdataset_cleanup patch through the codebase
Add a semantic patch to turn the conditional rdataset disassociate into
dns_rdataset_cleanup() call and run it.
2025-12-17 15:19:55 +01:00
Ondřej Surý
23ae5544be
Add more information to the rndc recursing output about fetches
It is possible to have a fetch that is active, but it has been cloned,
so it won't be used when found in the hash table.   The fetch options
also prevent matching in the hash table, so add a hexadecimal dump of
the fctx->options to the output.
2025-12-09 17:31:45 +01:00
Evan Hunt
d4ebea1037 use a standard CLEANUP macro
CLEANUP is a macro similar to CHECK but unconditional, jumping
to cleanup even if the result is ISC_R_SUCCESS. It is now used
in place of DST_RET, CLEANUP_WITH, and CHECK(<non-success constant>).
2025-12-03 13:45:43 -08:00
Evan Hunt
6b33b7fc77 switch to RETERR where it wasn't being used
replace all instances of the pattern:

        result = <statement>
        if (result != ISC_R_SUCCESS) {
                return result;
        }

with:

        RETERR(<statement>);
2025-12-03 13:45:43 -08:00
Evan Hunt
38e94cc7da switch to CHECK where it wasn't being used
replace all instances of the pattern:

        result = <statement>
        if (result != ISC_R_SUCCESS) {
                goto cleanup;
        }

with:

        CHECK(<statement>);
2025-12-03 13:45:42 -08:00
Evan Hunt
52bba5cc34 standardize CHECK and RETERR macros
previously, there were over 40 separate definitions of CHECK macros, of
which most used "goto cleanup", and the rest "goto failure" or "goto
out". there were another 10 definitions of RETERR, of which most were
identical to CHECK, but some simply returned a result code instead of
jumping to a cleanup label.

this has now been standardized throughout the code base: RETERR is for
returning an error code in the case of an error, and CHECK is for jumping
to a cleanup tag, which is now always called "cleanup". both macros are
defined in isc/util.h.
2025-12-03 13:26:28 -08:00
Evan Hunt
76b6fb3802 pass isc_buffer_t pointers when applicable
In commit aea251f3bc, `isc_buffer_reserve()` was changed to
take a simple `isc_buffer_t *` instead of `isc_buffer_t **`.
A number of functions calling it have now been similarly
modified.
2025-11-28 18:47:49 +00:00
Aram Sargsyan
ed7b08c0c4 Fix a bug where tlsctx_cache could be destroyed while still in use
When named is being reconfigured, it detaches from the old
'isc_tlsctx_cache_t' TLS context cache object and creates a
new one. This can cause an assertion failure within the
resolver when the object is destroyed while still in use,
because the resolver is using the object without getting
attached to it.

Add an attach/detach so that the 'isc_tlsctx_cache_t' doesn't
get destroyed while still being in use.
2025-11-27 16:45:55 +00:00
Ondřej Surý
4d307ac67a
Detect resolution loops between fetches
Maintain the relationship between the parent and child fetch and when
creating a new child fetch, properly check the resolution loops that
would lead to a new fetch would join one of the parent's fetch contexts.
2025-11-27 17:34:25 +01:00
Ondřej Surý
ed460c50b7
Change the QNAME minimization algorithm to follow the standard
In !9155, the QNAME minimization was changed to not leak the query type
to the parent name server.  This violates RFC 9156 Section 3, step (3)
and it is not necessary.  It also breaks some (weird) authoritative DNS
setups, especially when CNAMEs are involved.  Also there is really no
privacy leak with query type.
2025-11-27 16:47:29 +01:00
Ondřej Surý
3e971db1ed
Add optional debugging output for fetch context reference counting
As the fetch context reference counting was converted to userspace RCU
reference counting, the ability to debug the reference counting was
lost.  Restore the debugging by adding the optional compile-time enabled
debugging output again.
2025-11-27 10:39:23 +01:00
Evan Hunt
d5e4684b3d remove dns_message_buildopt
now that the EDNS state is stored within dns_message_t, it's no longer
necessary to have a public API call to build an opt rdataset; we can
just have dns_message_setopt() build the opt record internally.
2025-11-21 11:13:21 -08:00
Evan Hunt
2d3439ee02 add dns_message API to add EDNS options
The new dns_message_ednsinit() and dns_message_ednsaddopt() functions
allow EDNS options to be added to a message one at a time; it is no
longer necessary to construct a full array of EDNS options and set
them all at once.

This allows us to simplify EDNS option handling code, and in the
future it wlil allow plugins to add EDNS options to existing
messages.
2025-11-21 11:13:18 -08:00
Ondřej Surý
d51effdb48
Refactor fctx_getaddresses() into couple smaller functions
The fctx_getaddresses() was lengthy and little bit confusing with
goto statements.  Split the single function into smaller parts:
one for forwarders, one for nameservers and one for alternates.
2025-11-20 13:32:17 +01:00
Ondřej Surý
1b90d2ffdb
Reduce the number of outgoing queries
The dns_resolver mode of operation is to resolve all the domains as it
iterates the DNS tree to fill up the cache as quickly as possible.

This commit reduces the number of outgoing queries by reducing the
number of remote fetches started for the nameserver addresses resolution
via dns_adb_createfind() to a smaller number per depth of the recursion
since the delegation point (3 2 1 0) - where 0 means only create fetch
on demand if we don't have any addresses yet.
2025-11-20 13:31:11 +01:00
Evan Hunt
e62895e999
Remove maybe_cancel_validators() function
When shutting down an fctx, validators can just be canceled
without checking whether there are pending finds.
2025-11-08 18:00:32 +01:00
Petr Špaček
4e167e5c65 Remove leftover FCTX_ATTR_NEEDEDNS0
This is leftover from commit b6309ed962
which removed support for bitstring labels back in 2002.
2025-11-03 15:24:59 +00:00
Mark Andrews
012a47476d Fix "shutdown system test crashed in dns_dispatchmgr_getblackhole"
While shutting down view->dispatchmgr is no longer valid.  Attach
to it and when creates a fetch context and use that pointer instead
of view->dispatchmgr.  Use dns_view_getdispatchmgr to do the attaching
as view->dispatchmgr is it managed using rcu.
2025-10-28 09:02:12 +11:00
Michał Kępień
c2a672bbae Merge tag 'v9.21.14' 2025-10-22 18:13:34 +02:00
Ondřej Surý
94b4d105e8
Apply the changes from updated set_if_not_null semantic patch 2025-10-08 17:44:50 +02:00
Mark Andrews
2e40705c06
Retry lookups with unsigned DNAME over TCP
To prevent spoofed unsigned DNAME responses being accepted retry
response with unsigned DNAMEs over TCP if the response is not TSIG
signed or there isn't a good DNS CLIENT COOKIE.
2025-10-02 12:54:42 +02:00
Mark Andrews
a41054e9e6
Further restrict addresses that are cached when processing referrals
Use the owner name of the NS record as the bailwick apex name
when determining which additional records to cache, rather than
the name of the delegating zone (or a parent thereof).
2025-10-02 12:54:42 +02:00
Mark Andrews
fa153f791f
Tighten restrictions on caching NS RRsets in authority section
To prevent certain spoofing attacks, a new check has been added
to the existing rules for whether NS data can be cached: the owner
name of the NS RRset must be an ancestor of the name being queried.
2025-10-02 12:54:42 +02:00
Colin Vidal
24011e205c remove CHECK_FOR_GLUE_IN_ANSWER
Macro CHECK_FOR_GLUE_IN_ANSWER is defined in `lib/dns/resolver.c` only,
documented nowhere and not exposed as build configuration. This is valid
at least for 9.21+, 9.20 and 9.18. Furthermore, it doesn't compile
anymore on 9.21+ with -DCHECK_FOR_GLUE_IN_ANSWER=1.

Considering it is very unlikely that anyone build named with this,
remove the code rather than fixing it.
2025-09-30 10:05:22 +02:00
Mark Andrews
ccc41c7044 re-split STATIC_ASSERT message 2025-09-24 12:14:28 +00:00
Mark Andrews
a64c350523 re-split log message text 2025-09-24 12:14:28 +00:00
Ondřej Surý
0dcf56d5e3
fixup! Use lock-free hashtable for storing resolver fetch contexts 2025-09-24 00:08:21 +02:00