The function had a single caller, the HTTP/2 request-path handling in
the network manager, and its semantics (strlen() of the source clamped
to the requested size) amounted to an obscured bounded string copy.
Replace the only use with a plain allocation and strlcpy(), and drop the
function.
When the connect callback's result is ISC_R_SUCCESS and the callback
changes the result because of some condition, the 'xfr' should not
be detached, because it now belongs to the receive callback.
Detach the reference only if the callback's result is non-success.
A buffer used to dump a DNS name in the delegdb dump flow was using the
wrong size: it was using `DNS_NAME_MAXWIRE` which is the actual max
length of a DNS name on the wire instead of using `DNS_NAME_FORMATSIZE`
which is the maximum length of a textual representation of a DNS name
(which can be way longer than `DNS_NAME_MAXWIRE` if using the master
file escape sequence format) plus 1 (end of string byte). This could
lead to a buffer overflow. This is now fixed.
The geoip2.c:match_string() function can incorrectly return 'true'
when matching strings of different lengths (i.e. it matches a
substring). Return 'false' when the lengths of the matched strings
are different.
When a Dynamic Update is received that removes the DNSKEY (or CDNSKEY,
or CDS) RRset, remove all records except the ones that are in use
for signing for the zone (with dnssec-policy).
A resolver that validated DNSSEC accepted RSA DNSKEYs of any modulus
size up to OpenSSL's compile-time ceiling, and accepted any public
exponent the wire format could carry. RSA verification cost grows
sharply with the modulus length, so an authoritative server could
publish an oversized DNSKEY to make each signature check on the
resolver many times more expensive than for a normally sized key.
The intended verify-time cap had no effect because the helper it called
returned the public-exponent bit length rather than the modulus bit
length, so the test was always satisfied. Replace it with an honest
modulus-range check and a stricter exponent check that accepts only odd
exponents in the closed range [3, 2^32 + 1] (covering every Fermat
prime up to F5 and the odd intermediate values seen in deployed keys),
reject anything outside those bounds at every RSA key load path so an
invalid key never reaches the verifier, and keep the same checks at the
verifier as a backstop against future load paths.
The adaptive isc_rwlock (the modified C-RW-WP variant) synchronizes a
reader against a writer through a store-buffer handshake across two
independent atomic objects: the reader publishes its arrival in
readers_ingress and then reads writers_lock, while the writer publishes
its lock in writers_lock and then reads the reader indicator. With the
acquire/release ordering introduced by the 2021 simplification, neither
side is forced to observe the other's publish store before its own check
load, so on weak-memory targets a reader could see writers_lock unlocked
while the writer sees the indicator empty, and both would enter their
critical sections at once.
Restore the sequentially consistent ordering the original algorithm
specifies on the handshake atomics. The single total order over the
seq_cst operations is what forbids the overlap; targeting individual
fences is both more fragile and, on x86, more expensive. On x86 this
ordering is free (seq_cst loads remain plain loads and the RMWs remain
lock-prefixed); the added cost falls only on the weak-memory targets that
actually need it.
Only class IN is allowed for user-defined views; the internally
generated `_bind` view stays in the CH class. Both `named` and the
shared checker in `lib/isccfg/check.c` now reject non-IN views, so a
config can no longer pass `named-checkconf` yet fail to start in
`named`.
Tests, configs, and catalog zones using CH or arbitrary classes
(e.g. `class10`) are removed accordingly.
Previously, the node_deleg_t would do double allocation, one for the
struct itself and one for the zonecut. This has been changed to use
variable sized struct with the zonecut .ndata buffer attached to the end
of node_deleg_t structure.
For SIG(0)-signed requests, view matching is offloaded and the request
is finished asynchronously from ns_client_request_continue(), which
passes client->inner.buffer to dns_dt_send(). That buffer aliases the
network manager's receive buffer, only valid during the read callback,
so it may already be freed and reused, producing garbage dnstap frames
(e.g. the "upforwd" sig0-over-DoT test fails with UQ=0).
When the request is offloaded (ns_client_setup_view() returns
DNS_R_WAIT) and dnstap is enabled, copy the request buffer and point
client->inner.buffer at the copy so it survives the asynchronous hop;
free it in ns__client_reset_cb(). When dnstap is disabled there is no
async consumer of the buffer, so detach it from the receive buffer
instead.
Assisted-by: Claude:claude-opus-4-8
Eviction of an entry owned by another loop was bounced to that loop via
isc_async_run(), so a queued list removal could run after the cache had
freed its LRU lists. Use a single mutex-guarded LRU list instead, removing
entries synchronously under the lock, and let each entry hold its own
memory-context reference so the RCU free never touches a gone loop.
When `fctx_query()` is called, a DTrace probe (if enabled) prints the
fetch context address, the upstream server address and port, and the
latest known SRTT for the server.
The delegation database kept one SIEVE LRU list per loop so that node
eviction could run lock-free on each node's owning loop; this required
every node to hold a loop reference and to defer its own destruction to
that loop via isc_async_run(). Move the SIEVE unlink into the QP write
transaction, taking the evicted node directly from dns_qp_deletename(),
which serialises every list mutation under the qpmulti writer lock and
lets a single shared list replace the per-loop arrays. Node and database
teardown are now synchronous.
The QP trie and the SIEVE list are wrapped in a reference-counted holder.
Each node keeps a reference to the holder so it (and its memory context)
stays valid until the node is destroyed, while shutdown drains the SIEVE
and destroys the trie from an RCU callback and frees the holder once the
last node drops its reference. Reuse across a reconfiguration now moves
ownership of the holder to the new view instead of sharing it through a
separate owners counter, so dns_delegdb_reuse() is removed.
Normally, the tid_count is initialized only once at the beginning of the
application. The only exception is the pattern in the unit test where
isc_loopmgr is repeatedly created and torn down and each creation of
isc_loopmgr_t calls isc__tid_initcount() with the previous value.
ThreadSanitizer sees that as write operation on unprotected memory are
reports this as data race even though the value has not really changed.
This has been fixed by skipping the tid_count value update on repeated
calls.
A previous commit introduced a latent bug where the wrong popcount
definition was used when overriding the compilation mode to C23.
This commit fixes it.
The validator starts a DS fetch from three places while building or
proving a trust chain. Only validate_dnskey() handed the resolver the
parent zone cut as a delegation hint; the other two started the fetch
with no hint at all.
Factor the shared setup into create_ds_fetch() and route all three
through it, so every validator DS fetch is created identically and
carries the parent zone cut. DS is an at-parent type, so the resolver
already anchors such a query at the parent on its own; supplying the
zone cut explicitly also lets the resolver's fetch loop detection match
the fetch by domain, which it cannot do for a fetch with no hint.
Assisted-by: Claude:claude-opus-4-8
A query for an RRSIG is handled as a subset of ANY, so rctx_answer_any()
filters out records that do not match the queried type. When every
record was filtered out (an answer carrying only unrelated types), the
function still returned success with nothing cached, and the fetch then
waited for a validator that was never started until the backstop fetch
timer fired ~12s later. Treat an all-filtered answer as a broken
response, matching how non-meta types already reject a reply with no
usable record.
The resolver turned a CNAME response to an RRSIG or NSEC query into
FORMERR inside rctx_answer_cname(). That is redundant -- every caller
already copes with a DNS_R_CNAME or DNS_R_DNAME result -- and it is the
wrong layer, because the resolver cannot tell a legitimate alias from a
broken one. Drop it; a CNAME for one of these types now flows back as
an ordinary alias.
The case that must be stopped lives in the validator. While proving an
unsigned CNAME insecure, proveunsecure() fetches the DS for the CNAME's
own name; because fetches are shared, that fetch re-enters and stalls on
the in-flight fetch the validator is waiting for, deadlocking for about
twelve seconds (GL#5878). Unlike the resolver, the validator knows it
is validating an alias, so check_chaining() now aborts a fetch whose
name matches the chaining rdataset's owner: it cannot advance the chain
and would only self-join.
dns_resolver_createfetch() guarded against fetch loops by comparing the
raw request name/type/domain before any fetch context existed. Move the
check after the context is obtained and run it against the context
itself, and only when we joined an already in-flight context
(!new_fctx) that is also an ancestor in the parent chain. That is the
real loop condition: the new fetch would block waiting on a fetch that
is itself waiting on us. A newly created context waits on nothing, so it
proceeds, bounded by the fetch depth limit and the complementary ADB
loop detection.
In MR !9740, we introduced an optimization that reduces memory usage
by processing rdatas in batches during AXFR.
The maximum batch size is 128, but the batch size was allowed to grow
beyond that limit if all rdatas in a batch were for the same name, as
that allows a more efficient optimization.
This optimization could theoretically allow the batch size arbitrarily
for a sufficient large zone transfer. Since synthetic tests don't show
any performance improvement from the optimization, this MR removes it.
Test that flushing the delegdb via `rndc flush` preserves its
configured size limit. The test checks delegdb watermarks after
`named` startup, flushes caches, and verifies that the delegdb
watermarks are correctly restored afterwards.
To distinguish between the previous `delegdb` memory contexts and the
new ones, we need to know exactly when the previous `delegdb` memory
contexts are removed (this is not immediate, since those are removed
during RCU reclamation phase). A trace is therefore added when a memory
context is destroyed, if `ISC_MEM_DEBUGTRACE` is set.
Instead of having independent APIs to configure various aspects of the
delegdb (i.e. cache size, other settings that may come up later), a
single configuration struct is passed to `dns_delegdb_setconfig()`, which
internally does all the plumbing. To avoid relying on
atomics/synchronization, `dns_delegdb_setconfig()` must be called from
exclusive mode (for now).
The configuration can be retrieved at any time (not necessarily from
exclusive mode) using `dns_delegdb_getconfig()`. This is useful, for
instance, to flush the delegdb without losing its parameters.
After SIG and NXT lost their special handling, KEY remained the only
RFC 2535-era type still receiving coexistence allowances: KEY
alongside CNAME at the same owner, KEY answered from the parent side
of a zone cut, KEY kept across CNAME eviction in the cache. RFC 3755
retains type 25 only for SIG(0) and TKEY transaction signatures, and
neither relies on those allowances in practice. The in-tree comment
that flagged the RFC 3007 parent-side carve-out as "unclear" predicted
this cleanup.
Zones that publish CNAME and KEY at the same owner — already invalid
under RFC 2181 — now fail to load. System test fixtures are updated
accordingly, and a new test asserts that SIG, NXT, and KEY records
pick up covering RRSIGs when their zone is signed.
RFC 3755 retired SIG and NXT in favour of RRSIG and NSEC. BIND still
warned about them at zone load, refused them in dynamic updates,
parsed SIG with a non-zero "type covered" field as a signature on an
RRset, and tracked them via dns_rdatatype_issig(). Those carve-outs
were the sole path that made the GL#5818 crash class reachable.
Treat both types as ordinary unknown rdata: they load, transfer, sign
and answer like any other record, and dynamic updates carry them
through the generic path. SIG(0) is unaffected; its message-parsing
carve-out is preserved.
redirect2() swaps qctx->db to the redirect zone before
query_nodata() runs. The DNS64 fallback there issues an A lookup
for the original query name, which is out of zone for the
redirect db, and the resulting query_notfound() trips
INSIST(!is_zone). The cached NCACHENXRRSET variant trips a
REQUIRE in dns_rdataset_first() on a disassociated rdataset.
The synth-from-dnssec entry reaches the same fallback via
query_coveringnsec(). Guarding the fallback with
!qctx->redirected leaves the nxdomain-redirect NXRRSET answer to
be served as-is.
When a dst_key_t carries a PKCS#11 URI in key->label (as named
does for dnssec-policy zones backed by a key-store "hsm"), key
generation must happen inside the HSM, not in software.
opensslecdsa_generate already branches on key->label and calls
the matching pkcs11 wrapper; the EDDSA generator silently ignored
the label and produced a software key, which named then wrote to
the .private file with both a Label: line and the raw PrivateKey:
bytes -- a corrupt hybrid record that prevented zone signing.
Add the missing wrapper:
- lib/isc/ossl_wrap/ossl3.c gains generate_pkcs11_eddsa_key()
and the public isc_ossl_wrap_generate_pkcs11_ed25519_key() /
isc_ossl_wrap_generate_pkcs11_ed448_key() entry points. They
use EVP_PKEY_CTX_new_from_name(NULL, "ED25519" or "ED448",
"provider=pkcs11") with the pkcs11_uri and pkcs11_key_usage
parameters, mirroring the existing EC wrapper.
- lib/isc/ossl_wrap/ossl1_1.c provides stubs returning
ISC_R_NOTIMPLEMENTED for the new EDDSA wrappers; the
pkcs11-provider stack requires OpenSSL 3. The pre-existing
isc_ossl_wrap_generate_pkcs11_rsa_key() stub used to silently
delegate to software keygen -- that hid the same "HSM label
on a software key" hazard for RSA on OpenSSL 1.1 builds, so
align it with the EDDSA stubs and return ISC_R_NOTIMPLEMENTED
too.
- lib/isc/include/isc/ossl_wrap.h declares the new wrappers.
- lib/dns/openssleddsa_link.c routes openssleddsa_generate()
through the new wrappers when key->label is non-NULL, leaving
the existing EVP_PKEY_keygen() path untouched for software
keys. The Ed448 case is guarded by HAVE_OPENSSL_ED448 to
match the surrounding code.
Assisted-by: Claude:claude-opus-4-7
openssleddsa_tofile() called EVP_PKEY_get_raw_private_key()
unconditionally whenever the dst_key_t had a private EVP_PKEY
attached and aborted with ISC_R_FAILURE on any error. That is
wrong for keys whose private material lives in a hardware token
(PKCS#11): the provider deliberately refuses to export the raw
bytes, but the keypair is still valid and the .private file
should be written containing only the PKCS#11 label, with no raw
key material. Without this, "dnssec-keyfromlabel -a ed25519 -l
pkcs11:..." fails with "failed to write key ...: failure" even
though pkcs11-tool has generated a valid Ed25519 key in SoftHSM.
Mirror the behaviour already implemented in opensslecdsa_tofile():
if the raw private key cannot be retrieved AND the key has a
PKCS#11 label to fall back on, clear the OpenSSL error queue and
fall through to writing just the Label element.
If extraction fails and there is no label to fall back on, return
the OpenSSL failure rather than silently producing a .private
file with neither raw key material nor a label, which would be
unusable on the next load.
Consolidate buffer cleanup into a single cleanup: path, freeing
with the original allocation size (alginfo->key_size) rather than
the potentially-modified len output parameter.
Assisted-by: Claude:claude-opus-4-7
copy_initfile() declared a size_t local variable to receive the size of
the initial file and passed it to isc_file_getsizefd() with an explicit
(off_t *) cast. On 32-bit platforms with _FILE_OFFSET_BITS=64, off_t is
8 bytes while size_t is only 4 bytes, so isc_file_getsizefd()'s
"*size = stats.st_size;" writes 8 bytes into the 4-byte slot and
clobbers the adjacent "output" FILE * on the stack. The next iteration
of the read/write loop then calls clearerr() through a NULL pointer and
named crashes with SIGSEGV.
This is triggered whenever a zone with an initial-file (e.g. one
configured via a template) is loaded for the first time, so on 32-bit
the addzone and masterfile system tests crash named in ns2 with cores.
Declare "len" as off_t to match the API and drop the unsafe cast.
Assisted-by: Claude:claude-opus-4-7
Since there is no `cfg_map_add()` anymore, and `cfg_map_addclone()`
wasn't quite clear, let's rename to `cfg_map_addclone()` into
`cfg_map_add()`, as this is fundamentally what this function is doing.
`cfg_map_findclause()` did not check whether a clause existed before
dereferencing it, which could lead to a NULL dereference. Add the
missing check to prevent this.
In practice, this was not triggering any known bug, since
`cfg_map_findclause()` is only called in contexts where the clause is
known to exist.
The static atomic last_purge is only read and written from mem_purge(),
which is compiled only when JEMALLOC_API_SUPPORTED or __GLIBC__ is
defined. This used to fail on OpenBSD:
../lib/isc/mem.c:405:31: error: unused variable 'last_purge' [-Werror,-Wunused-variable]
405 | static _Atomic(isc_stdtime_t) last_purge = 0;
| ^~~~~~~~~~
The NS_QUERYATTR_REDIRECT flag is set when processing a recursive
NXDOMAIN redirection lookup, so that if that lookup also returns
NXDOMAIN we don't end up looping.
Previously, the flag was left active after use, but if the
same client triggered a subsequent recursive lookup (for example,
in the filter-aaaa plugin), then the wrong branch could be reached
in query_resume(), potentially leading to an assertion failure. This
has been fixed.
BN_num_bits() returns 0 when passed NULL and a negative value on
internal error. The OpenSSL wrappers stored the result in a size_t,
so a 0 return falsely satisfied the bit-length check and a negative
return wrapped to a huge value. Capture the int return, reject
non-positive values, then compare against the limit.
A signature cannot cover a meta-type (NONE, ANY, AXFR, IXFR, MAILB,
MAILA, OPT, TSIG, TKEY); previously such records were cached by the
recursive resolver and collided with negative-cache entries on the
same owner name, corrupting the QP-trie cache.
Assisted-by: Claude:claude-opus-4-7
The dns_db_deleterdataset() removing the corresponding signature
within the iterator is wrong, because it mutates an rdataset
that is not the current one.
The hint feeds the EDNS OPT UDP-size field, which has no effect on TCP
transport. Avoid the dns_adb_getudpsize() lookup when the query is
already pinned to TCP.
Assisted-by: Claude:claude-opus-4-7
Two exits from fctx_try() landed at DNS_R_SERVFAIL without attaching
DNS_EDE_NOREACHABLEAUTH: when fctx_getaddresses() returned a non-success,
non-wait status, and when every candidate addrinfo was unusable
(over-quota or filtered) after a restart.
With the new TCP fallback actually firing, those paths are now reached
by serve-stale and similar scenarios in which the auth is unreachable.
Attach the EDE so SERVFAIL responses keep carrying the same operator
signal that the timeout-based exit paths already produce.
Co-authored-by: Evan Hunt <each@isc.org>
Assisted-by: Claude:claude-opus-4-7
The TCP-fallback fix in the previous commits means a query that would
previously have timed out on UDP now actually escalates to TCP, and a
TCP-side failure surfaces a non-ISC_R_TIMEDOUT result code to
query_usestale(). The trigger for DNS_DBFIND_STALESTART was previously
narrowed to ISC_R_TIMEDOUT, so the stale-refresh-time window stopped
opening for those clients.
Broaden the condition to any failure that has already cleared the
upstream DUPLICATE/DROP filtering in query_usestale() — the spirit of
the window is "the resolver tried and could not get a fresh answer",
not "the resolver timed out specifically".
Co-authored-by: Evan Hunt <each@isc.org>
Assisted-by: Claude:claude-opus-4-7
Make the decision in fctx_query() before the dispatch is bound so the
chosen transport and the DNS_FETCHOPT_TCP flag agree. The previous
location in resquery_send() ran after the UDP dispatch had already been
attached, so the flag flip had no effect on the wire.
Moving the decision earlier also means FCTX_ADDRINFO_NOEDNS0 servers,
previously exempt, now escalate to TCP too. TCP works regardless of
EDNS state, so this is the intended behaviour.
Assisted-by: Claude:claude-opus-4-7
The retry path in resquery_send() that flipped DNS_FETCHOPT_TCP on a
query whose dispatch had already been bound as UDP in fctx_query() had
no effect on the transport actually used, but did leave a stale TCP
bit visible to downstream consumers (dnstap framing, cookie checks,
the AUTHORITY-NS spoofability guard).
The ineffective code has been removed from resquery_send(). The
TCP fallback functionality will be corrected and restored in the next
commit.
Assisted-by: Claude:claude-opus-4-7