BN_num_bits() returns 0 when passed NULL and a negative value on
internal error. The OpenSSL wrappers stored the result in a size_t,
so a 0 return falsely satisfied the bit-length check and a negative
return wrapped to a huge value. Capture the int return, reject
non-positive values, then compare against the limit.
A signature cannot cover a meta-type (NONE, ANY, AXFR, IXFR, MAILB,
MAILA, OPT, TSIG, TKEY); previously such records were cached by the
recursive resolver and collided with negative-cache entries on the
same owner name, corrupting the QP-trie cache.
Assisted-by: Claude:claude-opus-4-7
The dns_db_deleterdataset() removing the corresponding signature
within the iterator is wrong, because it mutates an rdataset
that is not the current one.
The hint feeds the EDNS OPT UDP-size field, which has no effect on TCP
transport. Avoid the dns_adb_getudpsize() lookup when the query is
already pinned to TCP.
Assisted-by: Claude:claude-opus-4-7
Two exits from fctx_try() landed at DNS_R_SERVFAIL without attaching
DNS_EDE_NOREACHABLEAUTH: when fctx_getaddresses() returned a non-success,
non-wait status, and when every candidate addrinfo was unusable
(over-quota or filtered) after a restart.
With the new TCP fallback actually firing, those paths are now reached
by serve-stale and similar scenarios in which the auth is unreachable.
Attach the EDE so SERVFAIL responses keep carrying the same operator
signal that the timeout-based exit paths already produce.
Co-authored-by: Evan Hunt <each@isc.org>
Assisted-by: Claude:claude-opus-4-7
The TCP-fallback fix in the previous commits means a query that would
previously have timed out on UDP now actually escalates to TCP, and a
TCP-side failure surfaces a non-ISC_R_TIMEDOUT result code to
query_usestale(). The trigger for DNS_DBFIND_STALESTART was previously
narrowed to ISC_R_TIMEDOUT, so the stale-refresh-time window stopped
opening for those clients.
Broaden the condition to any failure that has already cleared the
upstream DUPLICATE/DROP filtering in query_usestale() — the spirit of
the window is "the resolver tried and could not get a fresh answer",
not "the resolver timed out specifically".
Co-authored-by: Evan Hunt <each@isc.org>
Assisted-by: Claude:claude-opus-4-7
Make the decision in fctx_query() before the dispatch is bound so the
chosen transport and the DNS_FETCHOPT_TCP flag agree. The previous
location in resquery_send() ran after the UDP dispatch had already been
attached, so the flag flip had no effect on the wire.
Moving the decision earlier also means FCTX_ADDRINFO_NOEDNS0 servers,
previously exempt, now escalate to TCP too. TCP works regardless of
EDNS state, so this is the intended behaviour.
Assisted-by: Claude:claude-opus-4-7
The retry path in resquery_send() that flipped DNS_FETCHOPT_TCP on a
query whose dispatch had already been bound as UDP in fctx_query() had
no effect on the transport actually used, but did leave a stale TCP
bit visible to downstream consumers (dnstap framing, cookie checks,
the AUTHORITY-NS spoofability guard).
The ineffective code has been removed from resquery_send(). The
TCP fallback functionality will be corrected and restored in the next
commit.
Assisted-by: Claude:claude-opus-4-7
Two TSIG-authenticated TKEY DELETE queries for the same dynamic key,
arriving on different worker loops, could each enter
dns_tsigkey_delete() and cause over-decrementing the key refcount.
This has been fixed by making dns_tsigkey_delete() idempotent.
CNAME and other record types cannot coexist. DNSSEC records are the
exceptions to this rule.
If the answer contains a name with a CNAME, remove existing RRsets at
the same name from the cache.
If the answer contains a name without a CNAME, remove the CNAME RRset
at the same name from the cache.
SipHash-2-4 was designed as a conservative PRF/MAC with extra rounds
against future attacks. For hash tables, where outputs are never
exposed, SipHash-1-3 provides sufficient collision resistance with
fewer rounds. As the SipHash author noted: "I would be very surprised
if SipHash-1-3 introduced weaknesses for hash tables."
DNS cookies continue to use SipHash-2-4 since cookie values are sent
on the wire and must resist online attacks.
Pull the dns_message_findname() lookups into cache_delegglue() and
cache_delegglue6() so each helper now owns its glue lookup and returns
the number of addresses cached. cache_delegns() splits referrals into
two cases: in-domain (the NS name is below the delegation point) and
sibling/in-bailiwick.
An in-domain NS without glue is unresolvable by definition - the
resolver would have to ask the very server it's trying to find. Log
"missing mandatory glue" at notice level and skip the deleg entirely
rather than leaving an unusable entry in the set. A new
dns_delegset_freedeleg() undoes a fresh dns_delegset_allocdeleg() so
the rest of the delegation set is preserved.
Until now, the dispatcher silently dropped UDP responses from the
expected peer that carried the wrong DNS message id and kept listening
for the correct id to arrive within the read timeout. An off-path
attacker who knows the destination address and source port of an
outgoing fetch could exploit that quiet retry window to flood the
resolver with guessed responses; with a gigabit link the per-query
success probability grows linearly with the number of guesses that
arrive before the legitimate answer or the timeout.
Treat any such mismatch as a possible spoofing attempt and let the
resolver immediately retry the same query over TCP, the same control
path the truncation handler already uses.
Add a resolver statistics counter - exposed as 'queries retried over TCP
after a response with mismatched query id' in rndc stats and
'MismatchTCP' in the statistics channel
Assisted-by: Claude:claude-opus-4-7
Bouncing the offload itself to the target loop let the after-work
callback fire on the target thread and run the user's done callback
before the calling thread had published *dctxp / *lctxp. Enqueue on
the calling loop and bounce only the done callback instead, so the
publish is sequenced before the cross-thread hand-off by construction
and cannot be reintroduced by reordering the entry-point body.
The resolver populated the delegation database with every NS RR and
every glue address from a referral, with no aggregate bound. Resolution
only ever uses the first max-delegation-servers NS owners and a handful
of addresses per NS, so anything beyond that is dead memory.
Stop the NS loop in cache_delegns() at view->max_delegation_servers and
cap each glue rdataset at DELEG_MAX_GLUES_PER_NS (20) addresses, so each
NS owner contributes at most 20 A and 20 AAAA glues.
The dns_glue struct currently contains four dns_rdataset structs to hold
the glue. These structs are over 100 bytes each because they need to be
able to hold data for multiple types of databases.
Since the dns_glue_t type is only used by qpzone, we can instead hold
pointers to the vecheaders directly, and only bind the vecheaders to
the rdatasets when adding the glue to the message.
The dns_glue_t, dns_gluelist_t and dns_glue_additionaldata_ctx types are
only used in qpzone.c. This commits moves them to the private header
qpzone_p.h.
This is done in preparation of a followup commit that will refactor them
to use types that are private to qpzone.
The two new call sites added by the CLASS-validation work passed NULL
as the reason, but ns_client_dumpmessage() bails out early on a NULL
reason — so the message dump never happened. The intent was to dump
the message and let the follow-up ns_client_log() carry the reason
text, so pass "" to suppress the prefix without short-circuiting the
dump.
After the send callback completes, the UV request is freed but
the HTTP/2 socket's write buffer still points to the freed memory.
If nghttp2 subsequently needs to send frames (e.g. SETTINGS ACK),
the server_read_callback reads from the dangling buffer.
Clear the write buffer before freeing the UV request.
Replace the hysteretic hi_water/lo_water switch with a stochastic
check: always false below lo_water, always true at or above hi_water,
linearly ramped probability in between. This spreads cache cleaning
across many inserts instead of triggering a thundering herd once the
hi_water mark is crossed (which causes every addrdataset to enter the
LRU purge path simultaneously and serializes lookups behind the node
write locks).
The is_overmem atomic and its stores are no longer needed and are
removed. The existing tests that asserted specific hysteretic state
transitions are simplified to check only the deterministic boundaries.
Ensure that we don't attempt an ACL match for answer addresses
when handling a class-CHAOS zone. This is an additional line of
defense for YWH-PGM40640-74.
NOTIFY and UPDATE messages must specify a data class in the
QUESTION/ZONE section. NONE and ANY are meta-classes and not
appropriate here. Return FORMERR if either is used.
Rejecting messages with a query class of NONE addresses YWH-PGM40640-72,
YWH-PGM40640-82, and YWH-PGM40640-83. Rejecting messages with a query
class of ANY addresses YWH-PGM40640-87, YWH-PGM40640-88, and
YWH-PGM40640-117.
Fixes: isc-projects/bind9#5778Fixes: isc-projects/bind9#5782Fixes: isc-projects/bind9#5783Fixes: isc-projects/bind9#5797Fixes: isc-projects/bind9#5798Fixes: isc-projects/bind9#5853
Reject requests with unsupported or misused CLASS values before
further processing. Only IN, CH, HS, RESERVED0 (for DNS Cookies),
ANY (for TKEY negotiation), and NONE (for DNS UPDATE) are accepted;
all other classes return NOTIMP. Misuse of NONE or ANY outside
their allowed contexts returns FORMERR.
This adds further protection against bugs of the same general class
as YWH-PGM40640-70 and YWH-PGM40640-73.
Return NOTIMP for UPDATE and NOTIFY requests received for views with a
class other than IN. Only QUERY is now supported for non-IN views such
as CHAOS.
When running dns dns_rdata_tostruct() with types that are only defined
for class IN, ensure that the class is correct before proceeding.
Add an assertion that any zone being updated is of class IN. (Note
that previously, a DLZ zone could have its class value set incorrectly
to NONE; this has been fixed.)
This addresses YWH-PGM40640-70 and YWH-PGM40640-73 (as well as any
similar problems that might have occurred in the future) by minimizing
the code paths that can be reached by rdata classes other than IN, so it
is safe for the implementation to assume that rdatatypes that are only
defined for class IN, such as SVCB or WKS, have been parsed and
validated, and not accepted as unknown/opaque data.
Fixes: isc-projects/bind9#5777Fixes: isc-projects/bind9#5779
Force recursion off, and set allow-recursion/allow-recursion-on ACLs
to none, for views with a class other than IN. Log a configuration
warning if recursion is explicitly enabled for a non-IN view.
This addresses YWH-PGM40640-74 and YWH-PGM40640-75 by preventing any
attempt at recursive processing in a class-CHAOS view, ensuring that
server addresses used for recursive queries and received in recursive
responses are of the expected format.
Fixes: isc-projects/bind9#5780Fixes: isc-projects/bind9#5781
RFC 3645 Section 3.1.1 mandates that the client MUST abandon the
algorithm if replay_det_state is FALSE after GSS_Init_sec_context
completes. The previous commit checked MUTUAL and INTEG but missed
REPLAY, even though it was already requested in the input flags.
Add GSS_C_REPLAY_FLAG to the ret_flags bitmask check so all three
required properties (replay detection, mutual authentication, and
integrity) are verified.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After gss_accept_sec_context() completes, verify that the INTEG flag
is set in ret_flags. Without integrity protection, GSS-TSIG message
authentication cannot function correctly.
The server side was previously passing NULL for ret_flags, meaning it
never verified the negotiated security properties. The client side
was fixed in the previous commit; this fixes the server side.
After gss_init_sec_context() completes, verify that both MUTUAL and
INTEG flags are set in ret_flags. RFC 3645 Section 3.1.1 requires
the client to abandon the algorithm if either flag is missing, as
the security context would not provide mutual authentication or
message integrity.
Also fix uninitialized gss_name_t variable in dst_gssapi_initctx()
that could cause undefined behavior if gss_import_name() fails and
the cleanup path calls gss_release_name() on the uninitialized
value.
In dst_gssapi_acceptctx(), rename outtoken to outtokenp (matching BIND
convention for output pointer parameters) and free the allocated output
token buffer on error in the cleanup path.
In process_gsstkey(), route the empty-principal error path through
cleanup via CLEANUP() instead of returning early, so that the output
token, GSS context, and TSIG key are all freed consistently by the
existing cleanup block.
Reject multi-round GSS-API negotiation (GSS_S_CONTINUE_NEEDED) in
dst_gssapi_acceptctx(). Each call to gss_accept_sec_context()
allocates a context inside the GSS library; without this fix, the
context handle was passed back to process_gsstkey() which did not
store it persistently, leaking it on every incomplete negotiation.
An unauthenticated attacker could exhaust server memory by sending
repeated TKEY queries with GSSAPI tokens, each leaking one GSS
context. The leaked memory is allocated by the GSS library via
malloc(), bypassing BIND's memory accounting.
In practice, Kerberos/SPNEGO (the only mechanism used with BIND)
completes in a single round, so rejecting continuation does not
affect real-world deployments. See RFC 3645 Section 4.1.3.
When a SIG(0)-signed response triggers async ECDSA verification via
dns_message_checksig_async(), the respctx_t holds a raw pointer to
the resquery_t. If the fetch context is shut down while verification
is in flight (e.g. due to recursive-clients quota exhaustion), the
query is destroyed and the callback dereferences a dangling pointer.
Take a reference on the resquery_t when initializing the respctx_t,
and release it in both cleanup paths. The query's own reference to
the fetch context keeps the fctx alive transitively.
The SLIST (essentially `fctx->finds`, forwarders and dual-stack
alternatives aside) can have duplicate server addresses when multiple
in-domain nameservers share the same IP addresses:
sub.example. NS ns1.sub.example.
sub.example. NS ns2.sub.example.
ns1.sub.example. A 1.2.3.4
ns1.sub.example. A 5.6.7.8
ns2.sub.example. A 1.2.3.4
ns2.sub.example. A 5.6.7.8
If both 1.2.3.4 and 5.6.7.8 fail to return a valid answer, the resolver
would query each address twice.
The problem is fixed by replacing the two-phase server selection (sort
each find list by SRTT, sort finds by head SRTT) with a single linear
scan in nextaddress() that finds the lowest-SRTT unmarked, non-duplicate
address across all find lists.
The old approach had a correctness bug: after sorting, the resolver
picked the next address from the "current" find list rather than
globally. For example, with find lists [1, 15, 26] and [3, 4, 5], the
second pick would be SRTT 15 instead of the correct SRTT 3.
The new approach is both simpler and correct: each call to nextaddress()
walks all addresses, skips marked and duplicate entries, and returns the
one with the lowest SRTT. While this walk is repeated for each server
attempt, it operates on a small bounded list and is negligible compared
to the network I/O of querying the server.
The number of `dns_adbaddrfind_t` (NS address with metadata like SRTT)
returned from an ADB NS name lookup is now limited by the caller. The
default value (outside the resolver) uses `max-delegation-servers`, and
the resolver, for a given fetch, start with `max-delegation-servers` and
decrement it at each ADB fetch. This ensures that, for a given
delegation, no more than 13 nameservers will be contacted.
This is the same mechanism used when looking up `dns_adbaddrfind_t` from
a list of glues (addresses).
When an upstream server answers BADCOOKIE, no matter the transport used,
the resolver eventually resends the query using TCP. However, if the
upstream server responds with BADCOOKIE again over TCP, the resolver
would keep resending until the maximum query count is reached.
This is now fixed by stopping resending once the query has already been
sent over TCP.
Calls to `rctx_resend()` are done internally within the resolver, in
flow which are not supposed to happens more than once. For instance,
if some query fails, and a specific flag "F" wasn't set, then set the
flag and try again. This wouldn't occur more than once because if the
query fails the next attempt, the flag "F" would be set already, so the
resolver would move to the next server (or give up).
However, a subtle bug missing checking a flag, for instance, could lead
to an unbounded loop re-trying to query the same server. This is now
impossible as `rctx_resend()` also increment the query counters (so if
such case occurs, it would stop once the maximum limit is reached).
The dns_resstatscounter_retry are also only incremented if the
`fctx_query()` succeeds, similar to as is done in `fctx_try()`.
When a validator is being shut down, the associated name
`val->name` is set to NULL. This could cause a crash if a worker
thread subsequently added an EDE code to the response containing
val->name in the extra text.
`validator_addede()` now checks whether the name is NULL before
trying to add it to the extra text.
The allow-transfer/allow-query catalog zone custom properties support
only APL RRtypes. All other types are correctly rejected by the
catz_process_apl() function. However, when an APL RRtype is processed
by that function, and another (non-APL) RRtype is then attempted to be
processed, there is an assertion failure happening in the prologue
of the function because `*aclbp != NULL` (i.e. an APL has been already
processed). Move the code to do type checking before the affected
REQUIRE assertion.
The DNS64 state information stored in client->query.dns64_aaaaok
could cause an assertion failure in query_respond() if the server
was configured in such a way as to trigger a new recursion before
the query had been reset - for example, by using the filter-aaaa
plugin, which may need to recurse to find out whether an A record
exists.
This has been addressed by clearing DNS64 state information
immediately after the call to query_filter64().
In previous_closest_nsec(), a new qpreader was opened to search the NSEC
tree. It was possible for that to be used to update a QP iterator object
owned by the caller, and then be destroyed when the function returned.
This qpreader object isn't necessary anymore; since namespaces were
added to the QP trie in commit 15653c54a0, we can now just reuse the
existing reader for the main tree.
Each dns__nta_t now references its parent ntatable in nta_create() and
releases it in dns__nta_destroy(). This avoids a use-after-free in
fetch_done() and other callbacks that dereference nta->ntatable: the
ntatable could otherwise be released by view destruction while an
in-flight resolver fetch still holds a reference to the NTA.
makeslab(), makevec(), dns_rdatavec_merge() and dns_rdatavec_subtract()
summed per-record storage into an unsigned int with no upper-bound
check. An RRset whose total encoded size exceeds DNS_RDATA_MAXLENGTH
cannot fit in a DNS message and is unservable; building its in-memory
representation only burns memory on data that will fail at response
time, and at the upper bound the running sum could in theory wrap.
Cap the running total at DNS_RDATA_MAXLENGTH and return ISC_R_NOSPACE
when exceeded. Update the qpdb cache memory-purge test to use a
record size that fits within the new limit.
Assisted-by: Claude:claude-opus-4-7
RFC 3445 also eliminated the DNS_KEYTYPE_NOAUTH, DNS_KEYTYPE_NOCONF,
and DNS_KEYOWNER_ENTITY flags. With NOAUTH and NOCONF gone, the
concept of NOKEY can no longer be expressed in KEY records.
DNS_KEYOWNER_ENTITY was already unused as of 22d688f656 but still
defined; that is now also removed.
The DNS_KEYFLAG_EXTENDED flag was only legitimate for type KEY
and was eliminated by RFC 3445. Dropping the extended-flags
handling in pub_compare() also fixes a possible crash when
signing a zone whose journal contains a crafted DNSKEY: a
6-byte record with the EXTENDED bit set produced a memmove()
length that underflowed and ran off a stack buffer.
The previous hash_key() was a deterministic, unkeyed (<<1) + add over the
key words. An off-path attacker could invert it offline and submit
queries whose source /24, qname hash, and qtype map to a single bucket;
under chaining this turns every lookup into an O(N) walk under
rrl->lock and starves legitimate query processing on the very feature
deployed to mitigate DoS.
Replace it with isc_hash32(), which is HalfSipHash-2-4 keyed by a
per-process random seed, so collision sets cannot be precomputed.
Assisted-by: Claude:claude-opus-4-7
Once the walk reaches the root, splitting one more label off would
trip an internal assertion and abort named. Stop cleanly with
ISC_R_NOTFOUND so the dispatcher cancels the fetch. Only reachable
through misconfiguration (root configured as a primary with parental
agents, or a parent zone that NODATAs its own NS).
Assisted-by: Claude:claude-opus-4-7