Ensure that we don't attempt an ACL match for answer addresses
when handling a class-CHAOS zone. This is an additional line of
defense for YWH-PGM40640-74.
(cherry picked from commit e62673c765b52307c800e86f0185fe52b573c145)
When a SIG(0)-signed response triggers async ECDSA verification via
dns_message_checksig_async(), the respctx_t holds a raw pointer to
the resquery_t. If the fetch context is shut down while verification
is in flight (e.g. due to recursive-clients quota exhaustion), the
query is destroyed and the callback dereferences a dangling pointer.
Take a reference on the resquery_t when initializing the respctx_t,
and release it in both cleanup paths. The query's own reference to
the fetch context keeps the fctx alive transitively.
(cherry picked from commit 5b58caf5a2cd39d57a51b7b0373bfbc4877a96f9)
The SLIST (essentially `fctx->finds`, forwarders and dual-stack
alternatives aside) can have duplicate server addresses when multiple
in-domain nameservers share the same IP addresses:
sub.example. NS ns1.sub.example.
sub.example. NS ns2.sub.example.
ns1.sub.example. A 1.2.3.4
ns1.sub.example. A 5.6.7.8
ns2.sub.example. A 1.2.3.4
ns2.sub.example. A 5.6.7.8
If both 1.2.3.4 and 5.6.7.8 fail to return a valid answer, the resolver
would query each address twice.
The problem is fixed by replacing the two-phase server selection (sort
each find list by SRTT, sort finds by head SRTT) with a single linear
scan in nextaddress() that finds the lowest-SRTT unmarked, non-duplicate
address across all find lists.
The old approach had a correctness bug: after sorting, the resolver
picked the next address from the "current" find list rather than
globally. For example, with find lists [1, 15, 26] and [3, 4, 5], the
second pick would be SRTT 15 instead of the correct SRTT 3.
The new approach is both simpler and correct: each call to nextaddress()
walks all addresses, skips marked and duplicate entries, and returns the
one with the lowest SRTT. While this walk is repeated for each server
attempt, it operates on a small bounded list and is negligible compared
to the network I/O of querying the server.
(cherry picked from commit b1c5856a3764b4025e93f8baf06c45c8fa029752)
Calls to `rctx_resend()` are done internally within the resolver, in
flow which are not supposed to happens more than once. For instance,
if some query fails, and a specific flag "F" wasn't set, then set the
flag and try again. This wouldn't occur more than once because if the
query fails the next attempt, the flag "F" would be set already, so the
resolver would move to the next server (or give up).
However, a subtle bug missing checking a flag, for instance, could lead
to an unbounded loop re-trying to query the same server. This is now
impossible as `rctx_resend()` also increment the query counters (so if
such case occurs, it would stop once the maximum limit is reached).
The dns_resstatscounter_retry are also only incremented if the
`fctx_query()` succeeds, similar to as is done in `fctx_try()`.
(cherry picked from commit f3e74304889a2e8b69c8e88fc9a383589decda32)
Move the logic incrementing the query counter and the global query
counter into a dedicated helper function.
(cherry picked from commit 05d6da2de54c093689e675e81ae898ee41220666)
Previously, the user of dns_dispatch API had to first call
dns_dispatch_gettcp() and if that failed create a new TCP dispatch with
dns_dispatch_createtcp(). This has been changed and the TCP connection
reuse happens transparently inside dns_dispatch_createtcp(). There are
separate buckets for dns_resolver, dns_request and dns_xfrin units, so
these don't get mixed together.
(cherry picked from commit d5ee86b799)
Move the write to fctx->vresult after LOCK(&fctx->lock). The field was
being set before acquiring the lock, but dns_resolver_logfetch() reads
it under the same lock from another thread.
(cherry picked from commit a2bd833909)
The error cleanup in fctx_create() was missing isc_mutex_destroy() and
dns_ede_invalidate() calls. When error paths (cleanup_nameservers,
cleanup_fcount, cleanup_qmessage, cleanup_adb) were taken after the
mutex and edectx were initialized, the fctx memory was freed without
properly destroying these resources first.
(cherry picked from commit 5b1750f15f)
In fctx_query(), resquery_ref(query) is called before
dns_dispatch_connect() in anticipation of the resquery_connected()
callback consuming the reference.
When dns_dispatch_connect() fails synchronously on TCP (e.g. from
dns_transport_get_tlsctx() failing in tcp_dispatch_connect()), the
connect callback is never scheduled, so the extra reference is never
consumed. The error path then tears down the query via manual cleanup
(isc_mem_put) without going through the refcount destructor, leaving
the reference imbalanced.
Fix by dropping the extra reference on the error path, just after
dns_dispatch_done() which cleans up the dispatch entry.
(cherry picked from commit 2da669490c)
Replace the two-pass "random start index and wrap around" logic in
fctx_getaddresses_nameservers() with a statistically sound Fisher-Yates
shuffle.
The previous implementation picked a random starting node and did two
passes over the linked list to find query candidates. The new logic
extracts the available nameservers into a bounded, stack-allocated array
of dns_rdata_t structures.
This array is then randomized in-place using a Fisher-Yates shuffle.
Finally, the shuffled array is traversed sequentially to launch fetches
until the dynamic quota (fctx->pending_running >= fetches_allowed) is
reached.
This guarantees a fair random distribution for outbound queries while
properly respecting dynamic query limits, entirely within O(1) memory
and without the overhead of linked-list pointer shuffling or multiple
dataset traversals.
(cherry picked from commit 3c33e7d937)
When selecting nameserver addresses to be looked up we where
always selecting them in dnssec name order from the start of
the nameserver rrset. This could lead to resolution failure
despite there being address that could be resolved for the
other names. Use a random starting point when selecting which
names to lookup.
(cherry picked from commit b78052119a)
The fetch loop detection occured in two places: when
`dns_resolver_createfetch()` is invoked (looking up through the parent
fetches chain and stops the fetch if a parent fetch is the same qname and
qtype) and right after calling `dns_adb_findname()` in the resolver
(stops the fetch if the current fetch is the same name from the ADB
lookup, and ADB lookup needs to fetch it).
Regarding fetch loop detection at the `dns_resulver_createfetch()`
entry, there are case where both qname and qtype are similar but the
zonecut is different. This will then query different name servers and
get different responses. For instance, the following delegation
parent-side (both for `foo.example.` and `dnshost.example.`):
foo.example. 3600 NS ns.dnshost.example.
dnshost.example. 3600 NS ns.dnshost.example.
ns.dnshost.example. 3600 A 1.2.3.4
Then the child-side of `dnshost.example.`:
dnshost.example. 300 NS ns.dnshost.example.
ns.dnshost.example. 300 A 1.2.3.4
Then the child-side of `foo.example.`:
foo.example 3600 NS ns.dnshost.example.
a.foo.example 300 A 5.6.7.8
Obviously, there is a misconfiguration between the parent-side and the
child-side of `dnshost.example` (the mismatch of the TTL), but, this
happens...
Because the resolver is currently child-centric, the parent-side
delegation's glue of `dnshost.example.` will be overriden by the
child-side of the delegation. Once both A records will expires, the
resolver will attempt to find out the A RRs but will start from the
`foo.example.` zonecut, as the delegation itself is still valid.
Then the resolver will attempt to resolve `ns.dnshost.example.`, still
using the `foo.example.` zonecut, which will immediately trigger another
attempt to resolve `ns.foo.example.` (because the A RR is expired). This
is, however _not_ a loop, because the second attempt will have
`dnshost.example.` zonecut. And this changes everything, because the
resolver detects the A name is in-domain, and pass a flag to ADB so
`dns_view_find()` won't use the cache. As a result, the zonecut will be
`.`, and the hints (root servers) will be queried instead.
From that point, they'll return the parent-side delegation, which
includes the glue for `ns.dnshost.example/A`, and the resolution can
continue. Previously, this wouldn't be possible because a loop would be
detected from the second attempt to looking `ns.foo.example/A` and would
result in a SERVFAIL.
Now, the loop detection is relaxed as the loop is detected if the qname,
qtype _and_ zonecut are equals.
This commit also changes the way the loop detection post
`dns_adb_createfind()` works. From the same example above, there would
be two ADB fetches with the same name, but with two different ADB flags
(the first one without DNS_ADB_STARTATZONE, the second one with that
flag). It means that there will be two fetches out of those two ADB
lookups, both legit, and not a loop (i.e. it won't be stuck). To
differenciate between a find which has a pending fetch (which could be
from another find the current find has been attached to), a new find
option `DNS_ADBFIND_STARTEDFETCH` is introduced, which tells that the
current has did started a fetch.
That way, if a find doesn't have `DNS_ADBFIND_STARTEDFETCH` option but
has pending fetches, we know this is a find attached to a similar find
so this is a loop. Otherwise, with `DNS_ADBFIND_STARTEDFETCH`, we know
that even if there is a pending fetch, this is not a loop as the fetch
has just been started
(cherry picked from commit f623ab1fb3)
It is possible to have a fetch that is active, but it has been cloned,
so it won't be used when found in the hash table. The fetch options
also prevent matching in the hash table, so add a hexadecimal dump of
the fctx->options to the output.
(cherry picked from commit 23ae5544be)
Maintain the relationship between the parent and child fetch and when
creating a new child fetch, properly check the resolution loops that
would lead to a new fetch would join one of the parent's fetch contexts.
(cherry picked from commit 4d307ac67a)
When named is being reconfigured, it detaches from the old
'isc_tlsctx_cache_t' TLS context cache object and creates a
new one. This can cause an assertion failure within the
resolver when the object is destroyed while still in use,
because the resolver is using the object without getting
attached to it.
Add an attach/detach so that the 'isc_tlsctx_cache_t' doesn't
get destroyed while still being in use.
(cherry picked from commit ed7b08c0c4)
The dns_resolver mode of operation is to resolve all the domains as it
iterates the DNS tree to fill up the cache as quickly as possible.
This commit reduces the number of outgoing queries by reducing the
number of remote fetches started for the nameserver addresses resolution
via dns_adb_createfind() to a smaller number per depth of the recursion
since the delegation point (3 2 1 0) - where 0 means only create fetch
on demand if we don't have any addresses yet.
(cherry picked from commit 1b90d2ffdb)
While shutting down view->dispatchmgr is no longer valid. Attach
to it and when creates a fetch context and use that pointer instead
of view->dispatchmgr. Use dns_view_getdispatchmgr to do the attaching
as view->dispatchmgr is it managed using rcu.
(cherry picked from commit 012a47476d)
To prevent spoofed unsigned DNAME responses being accepted retry
response with unsigned DNAMEs over TCP if the response is not TSIG
signed or there isn't a good DNS CLIENT COOKIE.
(cherry picked from commit 2e40705c06)
Use the owner name of the NS record as the bailwick apex name
when determining which additional records to cache, rather than
the name of the delegating zone (or a parent thereof).
(cherry picked from commit a41054e9e6)
To prevent certain spoofing attacks, a new check has been added
to the existing rules for whether NS data can be cached: the owner
name of the NS RRset must be an ancestor of the name being queried.
(cherry picked from commit fa153f791f)
The code to test whether to store the RRSIGs on DNS_R_UNCHANGED
with CD=1 was failing because the comparison methods of the two
rdatatset instances were not compatible. Move the testing into
dns_db_addrdataset(), and request it by setting the DNS_ADD_EQUALOK
option. If the option is set and the old and new rrsets compare
as equal, dns_db_addrdataset() returns ISC_R_SUCCESS instead of
DNS_R_UNCHANGED.
(cherry picked from commit b954a1df43)
Print optionally a bit more details not passed to event in case
dns_view_findzonecut returns unexpected result. Result would be
visible later in foundevent, but found fname would be lost. Print it
into the log.
(cherry picked from commit d2c6966232)
When authoritative zone is loaded when query minimization query for the
same zone is already pending, it might receive unexpected result codes.
Normally DNS_R_CNAME would follow to query_cname after processing sent
events, but dns_view_findzonecut does not fill CNAME target into
event->foundevent. Usual lookup via query_lookup would always have that
filled.
Ideally we would restart the query with unmodified search name, if
unexpected change from recursing to local zone cut were detected. Until
dns_view_findzonecut is modified to export zone/cache source of the cut,
at least fail queries which went into unexpected state.
(cherry picked from commit 2fd3da54f9)
Caching prevents server upgrades being detected in a timely manner
and it can also prevent DNSSEC responses being requested.
(cherry picked from commit 90b2f94d9b)
the fix in commit 1edbbc32b4 was incomplete; the wrong
event result could also be set in cache_name() and validated().
(cherry picked from commit 9ebeb60174)
when the caching of a negative record failed because of the
presence of a positive one, ncache_adderesult() could override
this to ISC_R_SUCCESS. this could cause CNAME and DNAME responses
to be handled incorrectly. ncache_adderesult() now sets the result
code correctly in such cases.
(cherry picked from commit 1edbbc32b4)
The fetch context that held these values could be freed while there
were still active pointers to the memory. Using a reference counted
pointer avoids this.
(cherry picked from commit bfbaacc9a0)
When a response times out the fctx_cancelquery() function
incorrectly calculates it in the 'dns_resstatscounter_queryrtt5'
counter (i.e. >=1600 ms). To avoid this, the rctx_timedout()
function should make sure that 'rctx->finish' is NULL. And in order
to adjust the RTT values for the timed out server, 'rctx->no_response'
should be true. Update the rctx_timedout() function to make those
changes.
(cherry picked from commit 830e548111)
The resquery_response() function increases the response counter without
checking if the response was successful. Increase the counter only when
the result indicates success.
(cherry picked from commit 12e7dfa397)
Named was stopping nameserver address resolution attempts too soon
when dual stack servers are configured. Dual stack servers are
used when there are *not* addresses for the server in a particular
address family so find->status == DNS_ADB_NOMOREADDRESSES is not a
sufficient stopping condition when dual stack servers are available.
Call fctx_try to see if the alternate servers can be used.
(cherry picked from commit f98a8331aa)
Previously, the hashmap iterator for fetches-per-zone was destroy
outside the rwlock. This could lead to an assertion failure due to a
timing race with the internal rehashing of the hashmap table as the
rehashing process requires no iterators to be running when rehashing the
hashmap table. This has been fixed by moving the destruction of the
iterator inside the read locked section.
(cherry picked from commit 1e4fb53c61)
A change in 6aba56ae8 (checking whether a rejected RRset was identical
to the data it would have replaced, so that we could still cache a
signature) inadvertently introduced cases where processing of a
response would continue when previously it would have been skipped.
(cherry picked from commit d0fd9cbe3b)
Previously, the dns_resolver_dumpfetches() would go over the fetch
counters. Alas, because of the earlier optimization, the fetch counters
would be increased only when fetches-per-zone was not 0, otherwise the
whole counting was skipped for performance reasons.
Instead of using the auxiliary fetch counters hash table, use the real
hash table that stores the fetch contexts to dump the ongoing fetches to
the recursing file.
Additionally print more information about the fetch context like start
and expiry times, number of fetch responses, number of queries and count
of allowed and dropped fetches.
(cherry picked from commit c6b0368b21)
The order of the fetch context hash table rwlock and the individual
fetch context was reversed when calling the release_fctx() function.
This was causing a problem when iterating the hash table, and thus the
ordering has been corrected in a way that the hash table rwlock is now
always locked on the outside and the fctx lock is the interior lock.
(cherry picked from commit cf078fadeb)
Add a new dns_rdataset_equals() function to check whether two
rdatasets are equal in DNSSEC terms.
When an rdataset being cached is rejected because its trust
level is lower than the existing rdataset, we now check to see
whether the rejected data was identical to the existing data.
This allows us to cache a potentially useful RRSIG when handling
CD=1 queries, while still rejecting RRSIGs that would definitely
have resulted in a validation failure.
(cherry picked from commit 6aba56ae89)
Extended DNS error 22 (No reachable authority) was previously detected
when `fctx_expired` fired. It turns out this function is used as a
"safety net" and the timeout detection should be caught earlier.
It was working though, because of another issue fixed by !9927. Since
this change, the recursive request timed out detection occurs before
`fctx_expired` so EDE 22 is not added to the response message anymore.
The fix of the problem is to add the EDE 22 code in two situations:
- When the dispatch code timed out (rctx_timedout) the resolver code
checks various properties to figure out if it needs to make another
fetch attempt. One of the paramters if the fetch expiration time. If
it expires, the whole recursion is canceled, so it now adds the EDE 22
code.
- If the fetch expiration time doesn't expires in the case above (and
other parameters allows it) a new fetch attempt is made (fctx_query).
But before the new request is actually made, the fetch expiration time
is re-checked. It might then has elapsed, and the whole recursion is
canceled. So it now also adds the EDE 22 code here as well.
(cherry picked from commit 78274ec2b1)
Instead of mixing the dns_resolver and dns_validator units directly with
the EDE code, split-out the dns_ede functionality into own separate
compilation unit and hide the implementation details behind abstraction.
Additionally, the EDE codes are directly copied into the ns_client
buffers by passing the EDE context to dns_resolver_createfetch().
This makes the dns_ede implementation simpler to use, although sligtly
more complicated on the inside.
Co-authored-by: Colin Vidal <colin@isc.org>
Co-authored-by: Ondřej Surý <ondrej@isc.org>
(cherry picked from commit 2f8e0edf3b)
Add support for EDE codes 1 (Unsupported DNSKEY Algorithm) and 2
(Unsupported DS Digest Type) which might occurs during DNSSEC
validation in case of unsupported DNSKEY algorithm or DS digest type.
Because DNSSEC internally kicks off various fetches, we need to copy
all encountered extended errors from fetch responses to the fetch
context. Upon an event, the errors from the fetch context are copied
to the client response.
(cherry picked from commit 46a58acdf5)
Currently, the fetch context will continue running even when the last
fetch (response) has been removed from the context, so named can process
and cache the answer. This can lead to a situation where the number of
outgoing recursing clients exceeds the the configured number for
recursive-clients.
Be more stringent about the recursive-clients limit and shutdown the
fetch context immediately after the last fetch has been canceled from
that particular fetch context.
(cherry picked from commit 9f945c8b67)
When answering queries, don't add data to the additional section if
the answer has more than 13 names in the RDATA. This limits the
number of lookups into the database(s) during a single client query,
reducing query processing load.
Also, don't append any additional data to type=ANY queries. The
answer to ANY is already big enough.
(cherry picked from commit a1982cf1bb)
While implementing the global limit 'max-query-count', initially I
thought adding the variable to the resolver structure. But the limit
is per client request so it was moved to the view structure (and
counter in ns_query structure). However, I forgot to remove the
variable from the resolver structure again. This commit fixes that.
(cherry picked from commit 397ca34e34)
Add another option to configure how many outgoing queries per
client request is allowed. The existing 'max-recursion-queries' is
per restart, this one is a global limit.
(cherry picked from commit bbc16cc8e6)
Add support for Extended DNS Errors (EDE) error 22: No reachable
authority. This occurs when after a timeout delay when the resolver is
trying to query an authority server.
(cherry picked from commit d13e94b930)
Commit amended in order to fix usage of isc_log_write (adding dns_lctx
parameter)