LoadScopeScheduling._split_scope() uses rsplit("::", 1) to
extract the test file scope from a node ID. When parametrized
test values contain "::" (IPv6 addresses like "cafe:cafe::cafe"
or "::1"), the split lands inside the parameter instead of at
the .py:: boundary. This creates spurious scopes that get
assigned to different workers, each triggering a full fixture
setup (starting named instances).
Override _split_scope() in conftest.py to split on ".py::"
which is unambiguous.
Six tests in synthrecord/tests_synthrecord.py are affected.
A verification script is included in util/.
Assisted-by: Claude:claude-opus-4-7
Update PRIORITY_TESTS with the 10 longest-running test
scopes measured from CI (job 7468217). These get scheduled
first so that with --dist=loadscope they land on separate
workers instead of piling up at the end.
Also fix "serve-stale/" to "serve_stale/" to match the
actual directory name, and add a startup check that fails
if any PRIORITY_TESTS entry does not match an existing
directory.
Assisted-by: Claude:claude-opus-4-7
named-checkconf should reject a template that has options that must be
non-zero (max-refresh-time, max-retry-time, min-refresh-time,
min-retry-time).
rndc addzone with a zone that refers to such template should fail
cleanly.
Test that flushing the delegdb via `rndc flush` preserves its
configured size limit. The test checks delegdb watermarks after
`named` startup, flushes caches, and verifies that the delegdb
watermarks are correctly restored afterwards.
To distinguish between the previous `delegdb` memory contexts and the
new ones, we need to know exactly when the previous `delegdb` memory
contexts are removed (this is not immediate, since those are removed
during RCU reclamation phase). A trace is therefore added when a memory
context is destroyed, if `ISC_MEM_DEBUGTRACE` is set.
The `rndc flush` command flushes the delegdb by deleting the
existing database and creating a new one. In the process, the
delegdb was losing its configured size limit; as a result, once
flushed, the delegdb size became unbounded.
This is now fixed by using `dns_delegdb_getconfig()` to back up the
current configuration before instantiating a new delegdb, then
restoring it with `dns_delegdb_setconfig()`.
Instead of having independent APIs to configure various aspects of the
delegdb (i.e. cache size, other settings that may come up later), a
single configuration struct is passed to `dns_delegdb_setconfig()`, which
internally does all the plumbing. To avoid relying on
atomics/synchronization, `dns_delegdb_setconfig()` must be called from
exclusive mode (for now).
The configuration can be retrieved at any time (not necessarily from
exclusive mode) using `dns_delegdb_getconfig()`. This is useful, for
instance, to flush the delegdb without losing its parameters.
After SIG and NXT lost their special handling, KEY remained the only
RFC 2535-era type still receiving coexistence allowances: KEY
alongside CNAME at the same owner, KEY answered from the parent side
of a zone cut, KEY kept across CNAME eviction in the cache. RFC 3755
retains type 25 only for SIG(0) and TKEY transaction signatures, and
neither relies on those allowances in practice. The in-tree comment
that flagged the RFC 3007 parent-side carve-out as "unclear" predicted
this cleanup.
Zones that publish CNAME and KEY at the same owner — already invalid
under RFC 2181 — now fail to load. System test fixtures are updated
accordingly, and a new test asserts that SIG, NXT, and KEY records
pick up covering RRSIGs when their zone is signed.
RFC 3755 retired SIG and NXT in favour of RRSIG and NSEC. BIND still
warned about them at zone load, refused them in dynamic updates,
parsed SIG with a non-zero "type covered" field as a signature on an
RRset, and tracked them via dns_rdatatype_issig(). Those carve-outs
were the sole path that made the GL#5818 crash class reachable.
Treat both types as ordinary unknown rdata: they load, transfer, sign
and answer like any other record, and dynamic updates carry them
through the generic path. SIG(0) is unaffected; its message-parsing
carve-out is preserved.
Tests that exercise instrumentation, log output, or other behaviour
that only exists in developer builds (the gcc:almalinux9:amd64 CI job
sets -Ddeveloper=disabled to guard against such accidental coupling)
can now decorate themselves with isctest.mark.with_developer to skip on
non-developer builds.
Assisted-by: Claude:claude-opus-4-7
System tests can check FEATURE_DEVELOPER in the environment, but the
recommended pattern is the with_developer pytest marker added next.
Assisted-by: Claude:claude-opus-4-7
System tests that depend on log output, instrumentation, or other
behaviour only present in developer builds can use this probe to detect
the build configuration at runtime.
Assisted-by: Claude:claude-opus-4-7
An AAAA query for a non-existent name into a view that combines
nxdomain-redirect with dns64 used to abort named via the DNS64
fallback in query_nodata(). The new module exercises all three
documented entry paths into query_redirect(): the authoritative
NXDOMAIN path (ns7, tripping INSIST(!is_zone) in
query_notfound()), the recursive NCACHENXRRSET path (ns8,
tripping REQUIRE in dns_rdataset_first() on a disassociated
rdataset), and the synth-from-dnssec path (ns10 validating
against ns9's signed root, with a primer A query so the second
AAAA reaches query_redirect() via query_coveringnsec()). ns9
serves as a neutral upstream so the cached and synthesized
negatives land real NXRRSETs.
Assisted-by: Claude:claude-opus-4-7
The previous loader was a FileSystemLoader rooted at $srcdir, which
allowed any system test to include any other test's templates -- a
wider scope than intended. Every existing cross-test include already
targets _common/, so make that the only path.
ChoiceLoader + PrefixLoader keeps the existing '_common/foo.j2' path
convention working without changes to call sites. The '_common/'
prefix is deliberately kept rather than dropping it by rooting the
FileSystemLoader at _common/ directly:
- It signals at the include site that the file is a shared
template, not a sibling of the current test; readers don't need
to know the loader configuration to understand where the file
lives.
- It prevents shadowing: a test-local 'controls.conf.j2' would
not collide with the shared one, and the unqualified name keeps
its test-local meaning.
- It makes the dependency greppable: 'grep -rl _common/'
identifies every test that consumes shared snippets.
Assisted-by: Claude:claude-opus-4-7
Add commonly used zone-related data (config snippet and zone file
snippets) as templates which can be reused by filling in different data.
Adjust the isctest.template.Zone to use filepath argument rather than
filename for clarity.
Omit extra newlines when combining and including templates.
Adjust the xfer/ns8/small.db.j2 so it doesn't trim the endline twice
(as that would join the two subsequent records on the same line).
In some cases, the template data might need to be set directly in the
jinja2 templates using `{% set %}`. Expose the template dataclasses to
the templates so we can use these existing classes, rather than creating
ad-hoc data containers.
If a template is being rendered into a directory that represents a
nameserver (e.g. "ns1"), include a nameserver-specific information in
the data - variable called "ns" which has information about the
nameserver this file belongs to.
Ensure the "ns" variable is only exposed to the template when rendered,
without affecting the environment variables (always work with a copy of
the env_vars).
Extend the Nameserver to generate the default IPv4/IPv6 values, add NSX
values for the predefined nameservers (there are 11 of them, as per
bin/tests/system/ifconfig.sh.in max value). Add the missing ns11
fixture.
Extend the Zone to derive the zone filename by default, unless
specified.
Adjust the existing uses of these classes to utilize the simplified
defaults.
Add a variant of checking configuration where inline-signing is
enabled on the secondary, requiring the 'file' entry. This time,
inline-signing is implicitly enabled via dnssec-policy.
Previously, the server would crash if it received a query with an ID
close to 65535 in the badmessageid case, as adding 50 to it would not
fit in uint16.
This was an oversight in porting it from Perl to Python in
f9ed3650ac.
dnspython's RRSIG.to_text() converts the signature inception/expiration
fields by calling time.gmtime(), which on 32-bit platforms raises
OverflowError for values past 2038-01-19 (INT32_MAX). Several DNSSEC
test fixtures use far-future expirations: the precomputed RRSIGs in
the dnssec test's rsasha1.example.db.in zone expire in 2093, ans4 of
the chain test hardcodes 2090, and ans10 of the dnssec test uses
2**32-1 (year 2106). Whenever a response carrying such an RRSIG is
formatted with str()/to_text() the overflow propagates out and either
fails the test (when triggered in isctest.query's debug logging) or
kills the asyncserver-based ans* server (when triggered in its
response logger), which in turn cascades into "Failed to stop
servers" teardown errors and SERVFAIL responses for subsequent tests.
Wrap the to_text() calls in isctest/query.py and the str(response)
call in asyncserver's _log_response() with try/except OverflowError,
falling back to a placeholder message. The conversions are only used
for debug logging, so losing the human-readable form there does not
affect what the tests actually validate.
Assisted-by: Claude:claude-opus-4-7
The nsec3hash tool parsed its algorithm, flags, and iterations
arguments with atoi(), then range-checked the result. For values
that overflow int during digit-by-digit accumulation, atoi() is
undefined; in practice on musl libc the modular wrap leaves
n == 0, which silently passes the "iterations > 0xffffU" check.
On Alpine Linux this made nsec3hash succeed with iterations
treated as 0 for inputs like 4294967296 (2^32).
The latent bug only surfaced when the recent image rebuild pulled
in Hypothesis 6.152.9 (2026-05-19), which unified the distribution
used for bounded and unbounded integers() strategies. The new
smoother distribution explores the 2^32 boundary on unbounded
ranges like integers(min_value=65536); earlier versions did not
reach there, so test_nsec3hash_too_many_iterations only started
failing on Alpine after the image refresh.
Replace the three atoi() calls with isc_parse_uint8 /
isc_parse_uint16, which uniformly reject overflow, trailing
garbage, leading sign, and non-numeric input across libc
implementations. As a side effect, error messages now include
the offending argument and a specific reason ("out of range" vs
"not a valid number").
Assisted-by: Claude:claude-opus-4-7
Make the handlers defined in bin/tests/system/resend_loop/ans3/ans.py
follow canonical naming conventions used in other system tests. Keep
all server initialization code in the main() function.
Since the _get_cookie() function is only used by the CookieHandler
class, make the former a method of the latter to keep related logic
close in the source code.
The "len(cookie.server) == 0" condition is superfluous for the
"resend_loop" system test, so remove it. Add a return type annotation
to the _get_cookie() function.
The "yield" keyword does not cause a function to return. By design,
get_responses() may yield multiple DNS responses in a single call. As
currently implemented, CookieHandler.get_responses() sends two responses
to each client query that does not contain a COOKIE option. Make the
logic in that method consistent with code comments by only sending one
response to every query - either SERVFAIL or BADCOOKIE, never both.
The ans3 custom server instance is created with default_aa=True. Do not
pass the authoritative=True keyword argument to the DnsResponseSend
constructor in CookieHandler.get_responses() as it is redundant.
The ans3 custom server does not have any zones defined, so the responses
passed to its handlers by core isctest.asyncserver code are guaranteed
to be empty. Remove a call to qctx.prepare_new_response() from
CookieHandler.get_responses() as it is redundant.
The RootNSHandler and ExampleNSHandler classes are only equipped to
respond to specific QNAME/QTYPE tuples, not all queries for a specific
QNAME. Turn them into subclasses of QnameQtypeHandler and make them
only respond to QTYPE=NS queries to prevent sending NS responses for
non-NS queries.
A signature cannot cover a meta-type (NONE, ANY, AXFR, IXFR, MAILB,
MAILA, OPT, TSIG, TKEY); previously such records were cached by the
recursive resolver and collided with negative-cache entries on the
same owner name, corrupting the QP-trie cache.
Assisted-by: Claude:claude-opus-4-7
With the resolver now legitimately escalating to TCP after repeated
UDP timeouts to the same authoritative, each lame-server lookup
takes ~50% longer to fail. The recursive-client backlog therefore
peaks a little higher before the fetches-per-server auto-tune drops
the quota below 200.
Bump the upper bound for the burst-against-lame-server and recovery
steps from 200 to 250 to absorb that extra latency. The lower bound
and the final post-recovery target (clients <= 20) are unchanged.
Assisted-by: Claude:claude-opus-4-7
The serve_stale shell suite uses a UDP-only perl mock as its
authoritative server. Now that the resolver escalates to TCP after
repeated UDP timeouts, three steps in serve_stale/tests.sh that
exercise resolver-query-timeout behaviour no longer reach the
timeout — the TCP fallback short-circuits to SERVFAIL via
`connection refused` on the perl mock.
Move those scenarios to a new system test directory
`bin/tests/system/serve_stale_tcp/` that uses a
ControllableAsyncDnsServer mock listening on both UDP and TCP, so
the resolver's TCP path is exercised end-to-end and the original
timing semantics are preserved. Remove the corresponding shell
steps from serve_stale/tests.sh.
Assisted-by: Claude:claude-opus-4-7
The "active sockets" and "queries in progress" assertions previously
required exactly one extra UDP/IPv4 socket and exactly one UDP query in
progress, with no TCP counterpart. That shape held only because the
broken TCP-fallback path left the resolver retrying UDP indefinitely.
With the fix in place, after two UDP timeouts to the same authority the
resolver legitimately escalates to TCP, and a stats snapshot taken
during recursion may catch the in-flight query on either transport.
Count the UDP and TCP counters together so the test reflects the new
correct behaviour.
Assisted-by: Claude:claude-opus-4-7
With the TCP fallback now actually firing after repeated UDP timeouts,
the resolver covers more retry transitions in the same wall-clock
window, and the original 3-second budgets in two steps of the
serve_stale test left no margin: the dig client at +timeout=3 and the
"sleep 3" before re-enabling the upstream both straddled the moment at
which the resolver switched transport, making the asserted outcome
race-prone.
Drop the dig timeout to 2s and the sleep to 1s so each step lands
firmly on one side of the transport switch.
Co-authored-by: Evan Hunt <each@isc.org>
Assisted-by: Claude:claude-opus-4-7
Make the decision in fctx_query() before the dispatch is bound so the
chosen transport and the DNS_FETCHOPT_TCP flag agree. The previous
location in resquery_send() ran after the UDP dispatch had already been
attached, so the flag flip had no effect on the wire.
Moving the decision earlier also means FCTX_ADDRINFO_NOEDNS0 servers,
previously exempt, now escalate to TCP too. TCP works regardless of
EDNS state, so this is the intended behaviour.
Assisted-by: Claude:claude-opus-4-7
The retry path in resquery_send() that flipped DNS_FETCHOPT_TCP on a
query whose dispatch had already been bound as UDP in fctx_query() had
no effect on the transport actually used, but did leave a stale TCP
bit visible to downstream consumers (dnstap framing, cookie checks,
the AUTHORITY-NS spoofability guard).
The ineffective code has been removed from resquery_send(). The
TCP fallback functionality will be corrected and restored in the next
commit.
Assisted-by: Claude:claude-opus-4-7
Add a serve-stale system test case where the authority changes a
CNAME RRset to A (at cname2.stale.test). The CNAME that is in the
cache is stale and should be refreshed. The target A record (at
a2.stale.test) has a longer TTL and is also still in the cache. The
next query should return the refreshed A RRset to the client.
Then the authority changes back the A RRset to CNAME. The A RRset
has become stale and should be refreshed. The next query should
return the refreshed CNAME RRset plus the already cached
a2.stale.test A record.
This test requires ns1 to allow dynamic updates to stale.test, and
prefetch to be disabled. The latter is to ensure the record is not
prefetched, but only refreshed when stale (and logs the expected
"an attempt to refresh the RRset" messages).
TSIG key names need to be any valid DNS name so that update-policy
"self" rules work with arbitrary names. Replace the
alnum+'.'+'-'+'_' charset filter in the key-generation tools with a
dns_name_fromstring() validity check.
The cache verification in steps 11 and 15 checks that the TTL has
decreased from its initial value to confirm the response was served
from cache, but the sleep between the two queries was missing. Both
queries could complete within the same second, leaving the TTL
unchanged and causing the test to incorrectly conclude the entry was
not cached.