Replace the two-pass "random start index and wrap around" logic in
fctx_getaddresses_nameservers() with a statistically sound partial
Fisher-Yates shuffle.
The previous implementation picked a random starting node and did two
passes over the linked list to find query candidates. The new logic
introduces fctx_getaddresses_nsorder() to perform an in-place
randomization of indices into a bounded, stack-allocated lookup array
(nsorder) representing the "winning" fetch slots.
The nameserver dataset is now traversed in exactly one sequential pass:
1. Every nameserver is evaluated for local cached data.
2. If the current nameserver's sequential index exists in the randomized
nsorder array, it is permitted to launch an outgoing network fetch.
3. If not, it is restricted to local lookups via DNS_ADBFIND_NOFETCH.
This guarantees a fair random distribution for outbound queries while
maximizing local cache hits, entirely within O(1) memory and without
the overhead of linked-list pointer shuffling or dynamic allocation.
Closes#5695
Merge branch '5695-refactor-the-random-NS-selection' into 'main'
See merge request isc-projects/bind9!11604
Introduce a new system test (nsprocessinglimit) to verify that the
resolver strictly respects outgoing network fetch quotas when presented
with heavily delegated, unresponsive zones.
This test acts as a regression check for the recent Fisher-Yates nameserver
selection refactor. It sets up an authoritative server delegating a zone
to 23 distinct nameservers (all pointing to unresponsive loopback IPs).
Using dnstap, the test forces a resolution failure and verifies that:
1. The resolver successfully traverses the zone delegation path.
2. The resolver caps the outgoing network queries to the delegated
nameservers exactly at the processing limit (20 fetches), ensuring
array boundaries and dynamic fetch quotas are strictly enforced without
crashing or hanging.
Replace the two-pass "random start index and wrap around" logic in
fctx_getaddresses_nameservers() with a statistically sound Fisher-Yates
shuffle.
The previous implementation picked a random starting node and did two
passes over the linked list to find query candidates. The new logic
extracts the available nameservers into a bounded, stack-allocated array
of dns_rdata_t structures.
This array is then randomized in-place using a Fisher-Yates shuffle.
Finally, the shuffled array is traversed sequentially to launch fetches
until the dynamic quota (fctx->pending_running >= fetches_allowed) is
reached.
This guarantees a fair random distribution for outbound queries while
properly respecting dynamic query limits, entirely within O(1) memory
and without the overhead of linked-list pointer shuffling or multiple
dataset traversals.
In a3d0f43d2 I moved the script that does this to the QA repo and
screwed up the path.
Fix the path and make the job run properly again.
Merge branch 'stepan/fix-tsan-stress' into 'main'
See merge request isc-projects/bind9!11599
A debug message that logs a PKCS#11 object has been generated was erroneously
logged at error level. This has been fixed.
Merge branch 'matthijs-fix-loglevel-keystore' into 'main'
See merge request isc-projects/bind9!11586
When selecting nameserver addresses to be looked up we where
always selecting them in dnssec name order from the start of
the nameserver rrset. This could lead to resolution failure
despite there being address that could be resolved for the
other names. Use a random starting point when selecting which
names to lookup.
Closes#5695Closes#5745
Merge branch '5695-add-random-server-selection' into 'main'
See merge request isc-projects/bind9!11395
Add randomizens system test which ensures that NS are randomly selected.
The test relies of the fact that `getaddresses_allowed()` logic won't
allow to query more than 3 NS at the top-level. The `example.` zone has
4 NS and the 3 formers are lame. As a result, if the resolved doesn't
randomize the NS selection, it will only quiery the 3 formers, which
won't give an answer, and fails. With randomization enabled, there is a
chance that the resolver queries the fourth NS, and gets the result.
When selecting nameserver addresses to be looked up we where
always selecting them in dnssec name order from the start of
the nameserver rrset. This could lead to resolution failure
despite there being address that could be resolved for the
other names. Use a random starting point when selecting which
names to lookup.
Both expire_name() and expire_entry() use isc_async mechanism to remove
the names and entries from the SIEVE-LRU lists on the matching isc_loop.
Under certain circumstances, this could lead to double counting the
purged named/entries when purging the SIEVE-LRU lists under the overmem
condition. This would cause not enough memory to be cleaned up and the
ADB would then never recover from the overmem condition leading to OOM
crash of the named.
Merge branch 'ondrej/fix-runaway-memory-in-adb' into 'main'
See merge request isc-projects/bind9!11544
Both `expire_name()` and `expire_entry()` use the isc_async mechanism to
remove names and entries from the SIEVE-LRU lists on the matching
isc_loop.
Under heavy load when the cleaning mechanism didn't have the chance to
kick in yet, this delay could lead to double-counting the purged names
and entries when purging the SIEVE-LRU lists during an overmem
condition. This would result in insufficient memory being cleaned up,
causing the ADB to never recover from the overmem condition and leading
to an OOM crash of `named`.
This patch resolves the issue by bypassing the async queue and executing
the removal synchronously if the target loop matches the current
isc_loop().
If an BIND 9 administrator imports an invalid SKR file, local stack
in the import function might overflow. This could lead to
a memory corruption on the stack and ultimately server crash.
This has been fixed.
ISC would like to thank mcsky23 for bringing this bug to our attention.
Closes#5758
Merge branch '5758-fix-stack-overflow-via-rndc-skr-import' into 'main'
See merge request isc-projects/bind9!11578
If an invalid SKR file is imported, reading the time from the token
buffer might overflow the buffer on the local stack. This has been
fixed by removing the intermediate buffer and parsing the lexer token
directly.
We forgot to change this when bumping CLANG_VERSION.
Merge branch 'mnowak/fix-clang-version-on-trixie' into 'main'
See merge request isc-projects/bind9!11596
When .next_length is longer than NSEC3_MAX_HASH_LENGTH, it causes a
harmless out-of-bound read of the isdelegation() stack. This has been
fixed.
Closes#5749
Merge branch '5749-fix-OOB-read-in-isdelegation' into 'main'
See merge request isc-projects/bind9!11553
Adds text and wire format unit tests to verify the newly enforced
maximum NSEC3 hash length constraints. These tests ensure that hash
lengths up to the 39-byte maximum are accepted, while larger sizes
correctly fail.
Adds a static system test that fails to load an NSEC3 record with an
invalid next part length. Additionally, introduces a dynamic test using
a crafted authoritative DNS proxy to inject invalid NSEC3 records on the
fly to test runtime behavior.
NSEC3 hashes are required to fit within a single DNS label. Since there
are 5 bits per label byte without pad characters, the maximum hash size
is floor(63*5/8) (39 bytes).
This patch enforces this maximum length for unknown algorithms, while
strictly enforcing the exact expected digest length for known algorithms
like SHA-1.
When .next_length is longer than NSEC3_MAX_HASH_LENGTH, it causes a
harmless out-of-bound read of the isdelegation() stack. This patch
fixes the issue by skipping NSEC3 records with an oversized hash length
during validation.
A regression was introduced when adding the EDE code for unsupported
DNSKEY and DS algorithms. When the parent has both supported and
unsupported algorithm in the DS record, the validator would treat the
supported DS algorithm as insecure when validating DNSKEY records
instead of BOGUS. This has not security impact as the rest of the child
zone correctly ends with BOGUS status, but it is incorrect and thus the
regression has been fixed.
Closes#5757
Merge branch '5757-fix-mixed-algorithm-DS-handling' into 'main'
See merge request isc-projects/bind9!11580
Add a system test that has one invalid DS record with supported
algorithm and one unsupported DS record. Both DNSKEY and A queries must
fail with SERVFAIL.
A regression was introduced when adding the EDE code for unsupported
DNSKEY and DS algorithms. When the parent has both supported and
unsupported algorithm in the DS record, the validator would treat the
supported DS algorithm as insecure when validating DNSKEY records
instead of BOGUS. This has not security impact as the rest of the child
zone correctly ends with BOGUS status, but it is incorrect and thus the
regression has been fixed.
A stale answer could have been served in case of multiple upstream
failures when following the CNAME chains. This has been fixed.
Closes#5751
Merge branch '5751-clear-staleflags-in-CNAME-chains' into 'main'
See merge request isc-projects/bind9!11558
Three variants of YWH-PGM40640-56: Stale/Wrong DNS Data Served via
CNAME Flag Leak (DNS_DBFIND_STALEOK persistence) are presented in
GitLab issue #5751. All these variants have been converted to system
tests.
Variant 1 forwards source.stale to another server, that provides a
CNAME record, while the resolver is authoritative for target.stale.
The CNAME points to a non-existing name. A stale CNAME record should
result in a stale NXDOMAIN (instead of SERVFAIL).
Variant 2 forwards both source.stale and target.stale to other servers.
This time the CNAME points to an A RRset. If the source.stale server
is not available (and stale-answer-client-timeout is off), the cached
CNAME should be followed and pick up the fresh RRset (instead of the
stale A RRset).
Variant 3 is similar to variant 2, but this time the CNAME points to
a non-existing name again. After flushing the target, BIND should
return a stale NXDOMAIN (instead of SERVFAIL).
In the last few years, the capabilities of coding tools have exploded.
As those capabilities have expanded, contributors and maintainers have
more and more questions about how and when to apply those capabilities.
Add new documentation to guide contributors on how to best use BIND 9
development tools, new and old.
In short: Please show your work and make sure your contribution is
easy to review.
This has been adopted from the Linux Kernel guidelines.
Merge branch 'ondrej/clarify-the-use-of-tools' into 'main'
See merge request isc-projects/bind9!11447
In the last few years, the capabilities of coding tools have exploded.
As those capabilities have expanded, contributors and maintainers have
more and more questions about how and when to apply those capabilities.
Add new documentation to guide contributors on how to best use BIND 9
development tools, new and old.
In short: Please show your work and make sure your contribution is
easy to review.
This has been adopted from the Linux Kernel guidelines.
Add a set of short examples at the end of the dig manual page to help new or infrequent users figure out the most basic ways to use dig.
Merge branch 'examples' into 'main'
See merge request isc-projects/bind9!11577
The goal here is to help new or infrequent users figure out the most
basic ways to use dig.
Notes on the choice of examples:
* I wrote examples that users can copy and paste exactly as is, without
having to come up with an appropriate IP address or domain name to use.
The one exception is the `dig -x` example which uses an IP from the
example range.
* `dig +noall +answer` here is because learning about `+noall +answer`
was lifechanging for me when I learned about it, I've heard from
others that they find it helpful too, and it's pretty hard to infer
from the man page as is that it might be useful
* I thought about adding `+trace` but left it out because 5 examples was
already starting to feel like a lot.
This is now duplicate as the default ports are already set in
isc_netmgr_create().
Merge branch 'ondrej/mr11569-followup-cleanup' into 'main'
See merge request isc-projects/bind9!11576
With the Python version bumped to 3.10 and the dependency situation cleared with !11415 it is now time to run linters and formatters on more parts of the Python code that was previously skipped or ignored.
Switch configuration of the various Python-adjacent tools to `pyproject.toml` to ensure that the same configuration is used in CI and locally.
See the individual commits for details on settings changed and linters added.
Tweaks to type checking and enabling more `ruff` lints will come in a subsequent MRs.
Prerequisites:
- bind9-qa!160.
- images!442
Merge branch 'stepan/python-tooling' into 'main'
See merge request isc-projects/bind9!11499
Add a pylint plugin that enforces:
- There is no bare `import dns` statement.
- All `dns.<module>` used are explicitly imported.
- There are no unused `dns.<module>` imports.
Fix all the imports to conform with this check.
In Python 3.10 strings don't support the | operator, so ruff doesn't
attempt to fix these. Quote the entire type specification to avoid the
typing.Optional import.
Alternatives I considered:
- leaving it as is (only use of Optional in the code base)
- using `from future import __annotations__` (replacing one import with
another one)