The ans3 custom server instance is created with default_aa=True. Do not
pass the authoritative=True keyword argument to the DnsResponseSend
constructor in CookieHandler.get_responses() as it is redundant.
The ans3 custom server does not have any zones defined, so the responses
passed to its handlers by core isctest.asyncserver code are guaranteed
to be empty. Remove a call to qctx.prepare_new_response() from
CookieHandler.get_responses() as it is redundant.
The RootNSHandler and ExampleNSHandler classes are only equipped to
respond to specific QNAME/QTYPE tuples, not all queries for a specific
QNAME. Turn them into subclasses of QnameQtypeHandler and make them
only respond to QTYPE=NS queries to prevent sending NS responses for
non-NS queries.
- Rewrite cherry-pick references during autorebases
- Fix autorebase error reporting
- Limit post-push pipelines for autorebased branches
- Only autorebase when there is anything to rebase
- Conflate missing commit reference notifications
- Support autorebasing backported security MRs
Merge branch 'michal/autorebase-improvements' into 'main'
See merge request isc-projects/bind9!12024
Autorebasing a backported security fix enables convenient refreshing of
cherry-pick references, which makes it trivial for developers to satisfy
Danger rules just before the merge request is merged. Add a manual CI
job that is only created for backported merge requests targeting
security-* branches.
Instead of creating a separate (potentially lengthy) Danger notification
for every missing commit reference in a backport, produce a single
notification with a list of all unreferenced commit hashes. This makes
Danger output more concise while retaining all the relevant feedback for
the developer.
In an optimistic future, security-* branches will become empty, at least
intermittently. When that happens, there will be nothing left to rebase
on those branches, so when something gets merged into their base
branches, an autorebase will effectively be a fast-forward. While the
existing autorebase logic would handle such a case perfectly fine, it is
prudent to avoid creating a test pipeline after pushing such a
fast-forward update as the code revision getting pushed will have
already been tested by other pipelines. However, the push should still
happen as non-empty downstream autorebased branches may exist and those
will still need to be rebased. Achieve both of these objectives by
checking early whether there is anything to rebase and pushing the
fast-forwarded version of the branch without setting the AUTOREBASE CI
variable if there is not.
Current CI job triggering rules cause a full pipeline to be started
after every push to security-* branches. In this context, "push" means
"branch update", which covers both "git push" invocations and merging a
merge request. Meanwhile, running a test pipeline is only desired after
a rebase; if a branch is fast-forwarded, it means that a merge request
has been merged into it and a pipeline should have already been run for
that merge request itself. Limit resource use by only triggering
pipelines for security-* branches when they are pushed to with a "magic"
CI variable that is only set in autorebase jobs. Leave all the other
triggering rules (for scheduled/manual pipelines) intact.
The logic used for detecting the commit breaking an autorebase does not
work correctly if the offending commit is not the first one applied
during the "reverse rebase". Fix by using REBASE_HEAD instead of
processing the output of "git status" in a convoluted way.
Furthermore, the approach used for identifying the first offending merge
request in the case of a successful autorebase followed by a failed
build only works correctly if the base branch is not autorebased itself.
Since a solution that would work correctly for a branch autorebased on
top of a branch that only moves forward does not work correctly for a
branch autorebased on top of another autorebased branch and vice versa,
accurately identifying the most likely culprit after a successful
autorebase is a very complicated and brittle task. Since reporting no
details at all is arguably better than reporting false details, only
produce a minimal error notification if the build fails after a
successful autorebase.
Each address entry stored by dns_delegset_addaddr() is an
isc_netaddrlink_t, whose size depends on sizeof(void *) via the
ISC_LINK macro (24 bytes of address + two prev/next pointers): 40
bytes on 64-bit, 32 bytes on 32-bit. The hardcoded 4 MB / 8 MB
ranges only held on 64-bit, so dns_deleg_cleanuptests failed on
armv7l with isc_mem_inuse() returning ~3.2 MB.
Express the expected ranges in terms of sizeof(isc_netaddrlink_t)
so they scale with pointer width, and pull the 99999 entry count
out into a NENTRIES macro.
Closeisc-projects/bind9#6012
Merge branch 'mnowak/armv7l-fix-dns_deleg_cleanuptests' into 'main'
See merge request isc-projects/bind9!12061
Each address entry stored by dns_delegset_addaddr() is an
isc_netaddrlink_t, whose size depends on sizeof(void *) via the
ISC_LINK macro (24 bytes of address + two prev/next pointers): 40
bytes on 64-bit, 32 bytes on 32-bit. The hardcoded 4 MB / 8 MB
ranges only held on 64-bit, so dns_deleg_cleanuptests failed on
armv7l with isc_mem_inuse() returning ~3.2 MB.
Express the expected ranges in terms of sizeof(isc_netaddrlink_t)
so they scale with pointer width, and pull the 99999 entry count
out into a NENTRIES macro.
Assisted-by: Claude:claude-opus-4-7
BN_num_bits() returns 0 on NULL input and a negative value on internal
error. The error return value is now properly handled.
Merge branch 'ondrej/fix-BN_num_bits-return-value' into 'main'
See merge request isc-projects/bind9!12057
BN_num_bits() returns 0 when passed NULL and a negative value on
internal error. The OpenSSL wrappers stored the result in a size_t,
so a 0 return falsely satisfied the bit-length check and a negative
return wrapped to a huge value. Capture the int return, reject
non-positive values, then compare against the limit.
A recursive resolver could accept and cache an RRSIG record whose
Type-Covered field names a meta-type (ANY, AXFR, IXFR, MAILA, MAILB),
even though no real RRset of those types ever exists. Such records
are now rejected by the DNS message parser.
Closes#6002
Merge branch '6002-reject-rrsig-covering-meta-types' into 'main'
See merge request isc-projects/bind9!12048
A signature cannot cover a meta-type (NONE, ANY, AXFR, IXFR, MAILB,
MAILA, OPT, TSIG, TKEY); previously such records were cached by the
recursive resolver and collided with negative-cache entries on the
same owner name, corrupting the QP-trie cache.
Assisted-by: Claude:claude-opus-4-7
The `dns_db_deleterdataset()` removing the corresponding signature within the iterator is wrong, because it mutates an rdataset that is not the current one. This has been fixed.
Merge branch 'matthijs-fix-evict-cname-other' into 'main'
See merge request isc-projects/bind9!12047
The dns_db_deleterdataset() removing the corresponding signature
within the iterator is wrong, because it mutates an rdataset
that is not the current one.
When an authoritative server failed to respond to two consecutive
UDP queries in a fetch, named was supposed to retry the next attempt
over TCP but in fact still sent it over UDP. The resolver now
properly switches the transport to TCP on the third attempt to
the same server.
Closes#5529
Merge branch '5529-fix-tcp-fallback-after-udp-timeouts' into 'main'
See merge request isc-projects/bind9!12022
The hint feeds the EDNS OPT UDP-size field, which has no effect on TCP
transport. Avoid the dns_adb_getudpsize() lookup when the query is
already pinned to TCP.
Assisted-by: Claude:claude-opus-4-7
With the resolver now legitimately escalating to TCP after repeated
UDP timeouts to the same authoritative, each lame-server lookup
takes ~50% longer to fail. The recursive-client backlog therefore
peaks a little higher before the fetches-per-server auto-tune drops
the quota below 200.
Bump the upper bound for the burst-against-lame-server and recovery
steps from 200 to 250 to absorb that extra latency. The lower bound
and the final post-recovery target (clients <= 20) are unchanged.
Assisted-by: Claude:claude-opus-4-7
The serve_stale shell suite uses a UDP-only perl mock as its
authoritative server. Now that the resolver escalates to TCP after
repeated UDP timeouts, three steps in serve_stale/tests.sh that
exercise resolver-query-timeout behaviour no longer reach the
timeout — the TCP fallback short-circuits to SERVFAIL via
`connection refused` on the perl mock.
Move those scenarios to a new system test directory
`bin/tests/system/serve_stale_tcp/` that uses a
ControllableAsyncDnsServer mock listening on both UDP and TCP, so
the resolver's TCP path is exercised end-to-end and the original
timing semantics are preserved. Remove the corresponding shell
steps from serve_stale/tests.sh.
Assisted-by: Claude:claude-opus-4-7
The "active sockets" and "queries in progress" assertions previously
required exactly one extra UDP/IPv4 socket and exactly one UDP query in
progress, with no TCP counterpart. That shape held only because the
broken TCP-fallback path left the resolver retrying UDP indefinitely.
With the fix in place, after two UDP timeouts to the same authority the
resolver legitimately escalates to TCP, and a stats snapshot taken
during recursion may catch the in-flight query on either transport.
Count the UDP and TCP counters together so the test reflects the new
correct behaviour.
Assisted-by: Claude:claude-opus-4-7
With the TCP fallback now actually firing after repeated UDP timeouts,
the resolver covers more retry transitions in the same wall-clock
window, and the original 3-second budgets in two steps of the
serve_stale test left no margin: the dig client at +timeout=3 and the
"sleep 3" before re-enabling the upstream both straddled the moment at
which the resolver switched transport, making the asserted outcome
race-prone.
Drop the dig timeout to 2s and the sleep to 1s so each step lands
firmly on one side of the transport switch.
Co-authored-by: Evan Hunt <each@isc.org>
Assisted-by: Claude:claude-opus-4-7
Two exits from fctx_try() landed at DNS_R_SERVFAIL without attaching
DNS_EDE_NOREACHABLEAUTH: when fctx_getaddresses() returned a non-success,
non-wait status, and when every candidate addrinfo was unusable
(over-quota or filtered) after a restart.
With the new TCP fallback actually firing, those paths are now reached
by serve-stale and similar scenarios in which the auth is unreachable.
Attach the EDE so SERVFAIL responses keep carrying the same operator
signal that the timeout-based exit paths already produce.
Co-authored-by: Evan Hunt <each@isc.org>
Assisted-by: Claude:claude-opus-4-7
The TCP-fallback fix in the previous commits means a query that would
previously have timed out on UDP now actually escalates to TCP, and a
TCP-side failure surfaces a non-ISC_R_TIMEDOUT result code to
query_usestale(). The trigger for DNS_DBFIND_STALESTART was previously
narrowed to ISC_R_TIMEDOUT, so the stale-refresh-time window stopped
opening for those clients.
Broaden the condition to any failure that has already cleared the
upstream DUPLICATE/DROP filtering in query_usestale() — the spirit of
the window is "the resolver tried and could not get a fresh answer",
not "the resolver timed out specifically".
Co-authored-by: Evan Hunt <each@isc.org>
Assisted-by: Claude:claude-opus-4-7
Make the decision in fctx_query() before the dispatch is bound so the
chosen transport and the DNS_FETCHOPT_TCP flag agree. The previous
location in resquery_send() ran after the UDP dispatch had already been
attached, so the flag flip had no effect on the wire.
Moving the decision earlier also means FCTX_ADDRINFO_NOEDNS0 servers,
previously exempt, now escalate to TCP too. TCP works regardless of
EDNS state, so this is the intended behaviour.
Assisted-by: Claude:claude-opus-4-7
The retry path in resquery_send() that flipped DNS_FETCHOPT_TCP on a
query whose dispatch had already been bound as UDP in fctx_query() had
no effect on the transport actually used, but did leave a stale TCP
bit visible to downstream consumers (dnstap framing, cookie checks,
the AUTHORITY-NS spoofability guard).
The ineffective code has been removed from resquery_send(). The
TCP fallback functionality will be corrected and restored in the next
commit.
Assisted-by: Claude:claude-opus-4-7
On a server configured with tkey-gssapi-keytab (or tkey-gssapi-credential),
an authenticated peer could crash named by sending two TKEY DELETE requests
for the same dynamic key in rapid succession. This has been fixed.
Closes#6001
Merge branch '6001-tsig-tkey-delete-uaf' into 'main'
See merge request isc-projects/bind9!12041
Two TSIG-authenticated TKEY DELETE queries for the same dynamic key,
arriving on different worker loops, could each enter
dns_tsigkey_delete() and cause over-decrementing the key refcount.
This has been fixed by making dns_tsigkey_delete() idempotent.
When an RRset is in stale cache, and the authoritative server changes the record type to CNAME, the resolver fails to refresh the stale cache. This has been fixed.
Closes#5302
Merge branch '5302-serve-stale-cname-to-a' into 'main'
See merge request isc-projects/bind9!11758
CNAME and other record types cannot coexist. DNSSEC records are the
exceptions to this rule.
If the answer contains a name with a CNAME, remove existing RRsets at
the same name from the cache.
If the answer contains a name without a CNAME, remove the CNAME RRset
at the same name from the cache.
Add a serve-stale system test case where the authority changes a
CNAME RRset to A (at cname2.stale.test). The CNAME that is in the
cache is stale and should be refreshed. The target A record (at
a2.stale.test) has a longer TTL and is also still in the cache. The
next query should return the refreshed A RRset to the client.
Then the authority changes back the A RRset to CNAME. The A RRset
has become stale and should be refreshed. The next query should
return the refreshed CNAME RRset plus the already cached
a2.stale.test A record.
This test requires ns1 to allow dynamic updates to stale.test, and
prefetch to be disabled. The latter is to ensure the record is not
prefetched, but only refreshed when stale (and logs the expected
"an attempt to refresh the RRset" messages).
Adds a CI job that runs PR-Agent against each merge request opened
from the canonical repository, posting an automated review and
code-improvement suggestions as MR comments. The job is gated to
same-project source branches so the OpenAI key and personal access
token are not exposed to fork pipelines.
Merge branch 'ondrej/add-pr-agent' into 'main'
See merge request isc-projects/bind9!12032
Run PR-Agent's `review` and `improve` commands against each merge
request from the canonical repository, posting an automated review
and code-improvement suggestions as MR comments. The rule restricts
the job to MRs whose source project matches CI_PROJECT_PATH so the
OpenAI key and GitLab personal access token are never exposed to
fork pipelines.
The key-generation tools (tsig-keygen, rndc-confgen) now accept any valid DNS name for key names.
Merge branch 'ondrej/allow-all-valid-keynames' into 'main'
See merge request isc-projects/bind9!12029