Commit graph

45028 commits

Author SHA1 Message Date
Michał Kępień
f3be1bf699
Tweak and reword release notes 2026-02-26 21:17:47 +01:00
Michał Kępień
32fa0c3ff0
Prepare release notes for BIND 9.21.19 2026-02-26 21:17:47 +01:00
Michał Kępień
a02da8cd4c
Generate changelog for BIND 9.21.19 2026-02-26 21:17:47 +01:00
Ondřej Surý
e7e96c7f1f chg: dev: Implement Fisher-Yates shuffle for nameserver selection
Replace the two-pass "random start index and wrap around" logic in
fctx_getaddresses_nameservers() with a statistically sound partial
Fisher-Yates shuffle.

The previous implementation picked a random starting node and did two
passes over the linked list to find query candidates. The new logic
introduces fctx_getaddresses_nsorder() to perform an in-place
randomization of indices into a bounded, stack-allocated lookup array
(nsorder) representing the "winning" fetch slots.

The nameserver dataset is now traversed in exactly one sequential pass:
1. Every nameserver is evaluated for local cached data.
2. If the current nameserver's sequential index exists in the randomized
   nsorder array, it is permitted to launch an outgoing network fetch.
3. If not, it is restricted to local lookups via DNS_ADBFIND_NOFETCH.

This guarantees a fair random distribution for outbound queries while
maximizing local cache hits, entirely within O(1) memory and without
the overhead of linked-list pointer shuffling or dynamic allocation.

Closes #5695

Merge branch '5695-refactor-the-random-NS-selection' into 'main'

See merge request isc-projects/bind9!11604
2026-02-26 07:33:29 +01:00
Colin Vidal
5274e764c4
Add test coverage for nameserver processing limits
Introduce a new system test (nsprocessinglimit) to verify that the
resolver strictly respects outgoing network fetch quotas when presented
with heavily delegated, unresponsive zones.

This test acts as a regression check for the recent Fisher-Yates nameserver
selection refactor.  It sets up an authoritative server delegating a zone
to 23 distinct nameservers (all pointing to unresponsive loopback IPs).

Using dnstap, the test forces a resolution failure and verifies that:
1. The resolver successfully traverses the zone delegation path.
2. The resolver caps the outgoing network queries to the delegated
   nameservers exactly at the processing limit (20 fetches), ensuring
   array boundaries and dynamic fetch quotas are strictly enforced without
   crashing or hanging.
2026-02-26 06:57:54 +01:00
Ondřej Surý
3c33e7d937
Implement Fisher-Yates shuffle for nameserver selection
Replace the two-pass "random start index and wrap around" logic in
fctx_getaddresses_nameservers() with a statistically sound Fisher-Yates
shuffle.

The previous implementation picked a random starting node and did two
passes over the linked list to find query candidates.  The new logic
extracts the available nameservers into a bounded, stack-allocated array
of dns_rdata_t structures.

This array is then randomized in-place using a Fisher-Yates shuffle.
Finally, the shuffled array is traversed sequentially to launch fetches
until the dynamic quota (fctx->pending_running >= fetches_allowed) is
reached.

This guarantees a fair random distribution for outbound queries while
properly respecting dynamic query limits, entirely within O(1) memory
and without the overhead of linked-list pointer shuffling or multiple
dataset traversals.
2026-02-26 06:57:53 +01:00
Štěpán Balážik
fafe462c1a fix: ci: Fix generate-tsan-stress-test-configs CI job
In a3d0f43d2 I moved the script that does this to the QA repo and
screwed up the path.

Fix the path and make the job run properly again.

Merge branch 'stepan/fix-tsan-stress' into 'main'

See merge request isc-projects/bind9!11599
2026-02-25 12:13:54 +00:00
Štěpán Balážik
4ed6c4e4e7 Fix generate-tsan-stress-test-configs CI job
In a3d0f43d2 I moved the script that does this to the QA repo and
screwed up the path.

Fix the path and make the job run properly again.
2026-02-25 12:13:02 +00:00
Matthijs Mekking
118ab70b42 fix: nil: Fix log level bug related to keystores
A debug message that logs a PKCS#11 object has been generated was erroneously
logged at error level. This has been fixed.

Merge branch 'matthijs-fix-loglevel-keystore' into 'main'

See merge request isc-projects/bind9!11586
2026-02-25 12:01:01 +00:00
Matthijs Mekking
5bd6322739 Fix log level bug in keystore
A debug message that logs a PKCS#11 object has been generated was
erroneously logged at error level. This has been fixed.
2026-02-25 11:34:07 +01:00
Ondřej Surý
55e9b72e3c fix: usr: Remove deterministic selection of nameserver
When selecting nameserver addresses to be looked up we where
always selecting them in dnssec name order from the start of
the nameserver rrset.  This could lead to resolution failure
despite there being address that could be resolved for the
other names.  Use a random starting point when selecting which
names to lookup.

Closes #5695
 
Closes #5745

Merge branch '5695-add-random-server-selection' into 'main'

See merge request isc-projects/bind9!11395
2026-02-25 10:05:55 +01:00
Colin Vidal
c67b52684f system test covering NS randomization
Add randomizens system test which ensures that NS are randomly selected.
The test relies of the fact that `getaddresses_allowed()` logic won't
allow to query more than 3 NS at the top-level. The `example.` zone has
4 NS and the 3 formers are lame. As a result, if the resolved doesn't
randomize the NS selection, it will only quiery the 3 formers, which
won't give an answer, and fails. With randomization enabled, there is a
chance that the resolver queries the fourth NS, and gets the result.
2026-02-25 09:31:14 +01:00
Mark Andrews
b78052119a Remove determinist selection of nameserver
When selecting nameserver addresses to be looked up we where
always selecting them in dnssec name order from the start of
the nameserver rrset.  This could lead to resolution failure
despite there being address that could be resolved for the
other names.  Use a random starting point when selecting which
names to lookup.
2026-02-25 09:27:03 +01:00
Ondřej Surý
22181ec1b8 sec: usr: Remove purged adb names and entries from SIEVE list immediately
Both expire_name() and expire_entry() use isc_async mechanism to remove
the names and entries from the SIEVE-LRU lists on the matching isc_loop.

Under certain circumstances, this could lead to double counting the
purged named/entries when purging the SIEVE-LRU lists under the overmem
condition.  This would cause not enough memory to be cleaned up and the
ADB would then never recover from the overmem condition leading to OOM
crash of the named.

Merge branch 'ondrej/fix-runaway-memory-in-adb' into 'main'

See merge request isc-projects/bind9!11544
2026-02-25 07:29:23 +01:00
Ondřej Surý
46cfac0825
Remove purged adb names and entries from SIEVE list immediately
Both `expire_name()` and `expire_entry()` use the isc_async mechanism to
remove names and entries from the SIEVE-LRU lists on the matching
isc_loop.

Under heavy load when the cleaning mechanism didn't have the chance to
kick in yet, this delay could lead to double-counting the purged names
and entries when purging the SIEVE-LRU lists during an overmem
condition.  This would result in insufficient memory being cleaned up,
causing the ADB to never recover from the overmem condition and leading
to an OOM crash of `named`.

This patch resolves the issue by bypassing the async queue and executing
the removal synchronously if the target loop matches the current
isc_loop().
2026-02-25 07:26:38 +01:00
Ondřej Surý
91286490c1 fix: usr: Importing invalid SKR file might corrupt stack memory
If an BIND 9 administrator imports an invalid SKR file, local stack
in the import function might overflow.  This could lead to
a memory corruption on the stack and ultimately server crash.
This has been fixed.

ISC would like to thank mcsky23 for bringing this bug to our attention.

Closes #5758

Merge branch '5758-fix-stack-overflow-via-rndc-skr-import' into 'main'

See merge request isc-projects/bind9!11578
2026-02-24 19:45:16 +01:00
Ondřej Surý
a82773ea89 Add system tests that imports invalid SKR file
Try to import invalid SKR file and observe whether the named is still
alive.  This test only triggers under ASAN.
2026-02-24 19:44:57 +01:00
Ondřej Surý
8ab4827a0c Importing invalid SKR file might overflow the stack buffer
If an invalid SKR file is imported, reading the time from the token
buffer might overflow the buffer on the local stack.  This has been
fixed by removing the intermediate buffer and parsing the lexer token
directly.
2026-02-24 19:44:57 +01:00
Michal Nowak
df2c66d964 fix: ci: Use LLVM 21 for "trixie"
We forgot to change this when bumping CLANG_VERSION.

Merge branch 'mnowak/fix-clang-version-on-trixie' into 'main'

See merge request isc-projects/bind9!11596
2026-02-24 19:05:05 +01:00
Michal Nowak
3cbc052602
Use LLVM 21 for "trixie"
We forgot to change this when bumping CLANG_VERSION.
2026-02-24 17:54:24 +01:00
Ondřej Surý
d8be931c49 chg: dev: Invalid NSEC3 can cause OOB read of the isdelegation() stack
When .next_length is longer than NSEC3_MAX_HASH_LENGTH, it causes a
harmless out-of-bound read of the isdelegation() stack.  This has been
fixed.

Closes #5749

Merge branch '5749-fix-OOB-read-in-isdelegation' into 'main'

See merge request isc-projects/bind9!11553
2026-02-24 16:00:35 +01:00
Mark Andrews
e83a182056
Test maximum length NSEC3 hash detection
Adds text and wire format unit tests to verify the newly enforced
maximum NSEC3 hash length constraints.  These tests ensure that hash
lengths up to the 39-byte maximum are accepted, while larger sizes
correctly fail.
2026-02-24 15:00:10 +01:00
Mark Andrews
f030bc6756
Remove invalid REQUIRE in NSEC3 fromstruct method
The NSEC3 fromstruct method only worked for hash type 1
when it should work for all hash types.
2026-02-24 14:58:18 +01:00
Ondřej Surý
7b737bc1c4
Add tests for NSEC3 invalid length
Adds a static system test that fails to load an NSEC3 record with an
invalid next part length.  Additionally, introduces a dynamic test using
a crafted authoritative DNS proxy to inject invalid NSEC3 records on the
fly to test runtime behavior.
2026-02-24 14:57:58 +01:00
Mark Andrews
3801d0ebbf
Enforce NSEC3 record consistency
NSEC3 hashes are required to fit within a single DNS label.  Since there
are 5 bits per label byte without pad characters, the maximum hash size
is floor(63*5/8) (39 bytes).

This patch enforces this maximum length for unknown algorithms, while
strictly enforcing the exact expected digest length for known algorithms
like SHA-1.
2026-02-24 14:57:22 +01:00
Ondřej Surý
67b4fb56e4
Invalid NSEC3 can cause OOB read of the isdelegation() stack
When .next_length is longer than NSEC3_MAX_HASH_LENGTH, it causes a
harmless out-of-bound read of the isdelegation() stack.  This patch
fixes the issue by skipping NSEC3 records with an oversized hash length
during validation.
2026-02-24 14:56:29 +01:00
Ondřej Surý
d4ec8ebee8 fix: usr: Fail DNSKEY validation when supported but invalid DS is found
A regression was introduced when adding the EDE code for unsupported
DNSKEY and DS algorithms.  When the parent has both supported and
unsupported algorithm in the DS record, the validator would treat the
supported DS algorithm as insecure when validating DNSKEY records
instead of BOGUS.  This has not security impact as the rest of the child
zone correctly ends with BOGUS status, but it is incorrect and thus the
regression has been fixed.

Closes #5757

Merge branch '5757-fix-mixed-algorithm-DS-handling' into 'main'

See merge request isc-projects/bind9!11580
2026-02-23 20:57:50 +01:00
Ondřej Surý
46f15f4f9d
Add test for mixed unsupported DS records
Add a system test that has one invalid DS record with supported
algorithm and one unsupported DS record.  Both DNSKEY and A queries must
fail with SERVFAIL.
2026-02-23 19:53:48 +01:00
Ondřej Surý
f983a64152
Fail DNSKEY validation when supported but invalid DS is found
A regression was introduced when adding the EDE code for unsupported
DNSKEY and DS algorithms.  When the parent has both supported and
unsupported algorithm in the DS record, the validator would treat the
supported DS algorithm as insecure when validating DNSKEY records
instead of BOGUS.  This has not security impact as the rest of the child
zone correctly ends with BOGUS status, but it is incorrect and thus the
regression has been fixed.
2026-02-23 11:34:43 +01:00
Matthijs Mekking
2c67f8bbca fix: usr: Clear serve-stale flags when following the CNAME chains
A stale answer could have been served in case of multiple upstream
failures when following the CNAME chains.  This has been fixed.

Closes #5751

Merge branch '5751-clear-staleflags-in-CNAME-chains' into 'main'

See merge request isc-projects/bind9!11558
2026-02-23 07:50:48 +00:00
Ondřej Surý
d46277b398 Clear serve-stale flags when following the CNAME chains
A stale answer or SERVFAIL could have been served in case of multiple
upstream failures when following the CNAME chains. This has been fixed.
2026-02-23 08:07:12 +01:00
Matthijs Mekking
c32de7df95 Test serve-stale with upstream zones and CNAMEs
Three variants of YWH-PGM40640-56: Stale/Wrong DNS Data Served via
CNAME Flag Leak (DNS_DBFIND_STALEOK persistence) are presented in
GitLab issue #5751. All these variants have been converted to system
tests.

Variant 1 forwards source.stale to another server, that provides a
CNAME record, while the resolver is authoritative for target.stale.
The CNAME points to a non-existing name. A stale CNAME record should
result in a stale NXDOMAIN (instead of SERVFAIL).

Variant 2 forwards both source.stale and target.stale to other servers.
This time the CNAME points to an A RRset. If the source.stale server
is not available (and stale-answer-client-timeout is off), the cached
CNAME should be followed and pick up the fresh RRset (instead of the
stale A RRset).

Variant 3 is similar to variant 2, but this time the CNAME points to
a non-existing name again. After flushing the target, BIND should
return a stale NXDOMAIN (instead of SERVFAIL).
2026-02-23 08:07:12 +01:00
Ondřej Surý
6d3252bbe6 new: doc: Provide guidelines for tool-generated content
In the last few years, the capabilities of coding tools have exploded.
As those capabilities have expanded, contributors and maintainers have
more and more questions about how and when to apply those capabilities.

Add new documentation to guide contributors on how to best use BIND 9
development tools, new and old.

In short: Please show your work and make sure your contribution is
easy to review.

This has been adopted from the Linux Kernel guidelines.

Merge branch 'ondrej/clarify-the-use-of-tools' into 'main'

See merge request isc-projects/bind9!11447
2026-02-23 07:23:25 +01:00
Ondřej Surý
3fe2215afb Provide guidelines for tool-generated content
In the last few years, the capabilities of coding tools have exploded.
As those capabilities have expanded, contributors and maintainers have
more and more questions about how and when to apply those capabilities.

Add new documentation to guide contributors on how to best use BIND 9
development tools, new and old.

In short: Please show your work and make sure your contribution is
easy to review.

This has been adopted from the Linux Kernel guidelines.
2026-02-23 07:23:10 +01:00
Ondřej Surý
bc0b26439b chg: doc: Add examples to the dig man page
Add a set of short examples at the end of the dig manual page to help new or infrequent users figure out the most basic ways to use dig.

Merge branch 'examples' into 'main'

See merge request isc-projects/bind9!11577
2026-02-22 17:20:50 +01:00
Julia Evans
8972ed9424 Add examples to the dig man page
The goal here is to help new or infrequent users figure out the most
basic ways to use dig.

Notes on the choice of examples:

* I wrote examples that users can copy and paste exactly as is, without
  having to come up with an appropriate IP address or domain name to use.
  The one exception is the `dig -x` example which uses an IP from the
  example range.
* `dig +noall +answer` here is because learning about `+noall +answer`
  was lifechanging for me when I learned about it, I've heard from
  others that they find it helpful too, and it's pretty hard to infer
  from the man page as is that it might be useful
* I thought about adding `+trace` but left it out because 5 examples was
  already starting to feel like a lot.
2026-02-22 11:03:10 -05:00
Ondřej Surý
92d3c7d011 fix: nil: Cleanup setting netmgr ports from isc_managers_create()
This is now duplicate as the default ports are already set in
isc_netmgr_create().

Merge branch 'ondrej/mr11569-followup-cleanup' into 'main'

See merge request isc-projects/bind9!11576
2026-02-20 17:25:04 +01:00
Ondřej Surý
10270f6b42
Cleanup setting netmgr ports from isc_managers_create()
This is now duplicate as the default ports are already set in
isc_netmgr_create().
2026-02-20 16:37:44 +01:00
Štěpán Balážik
6907c4f349 chg: ci: Rework linting of Python code
With the Python version bumped to 3.10 and the dependency situation cleared with !11415 it is now time to run linters and formatters on more parts of the Python code that was previously skipped or ignored.

Switch configuration of the various Python-adjacent tools to `pyproject.toml` to ensure that the same configuration is used in CI and locally.

See the individual commits for details on settings changed and linters added. 

Tweaks to type checking and enabling more `ruff` lints will come in a subsequent MRs.

Prerequisites:
- bind9-qa!160.
- images!442

Merge branch 'stepan/python-tooling' into 'main'

See merge request isc-projects/bind9!11499
2026-02-20 14:59:10 +00:00
Štěpán Balážik
8b0a8dbd8e Add ruff job to CI
Run the linter on Python code changes in CI.
2026-02-20 15:17:32 +01:00
Štěpán Balážik
ced002c4ab Replace deprecated typing imports
More specific modules (like collections.abc) can now be used.

Generated with: ruff check --extend-select UP035 --fix
2026-02-20 15:17:32 +01:00
Štěpán Balážik
d3186c7038 Clean up imports of dnspython modules
Add a pylint plugin that enforces:
  - There is no bare `import dns` statement.
  - All `dns.<module>` used are explicitly imported.
  - There are no unused `dns.<module>` imports.

Fix all the imports to conform with this check.
2026-02-20 15:17:32 +01:00
Štěpán Balážik
1d5924c82f Replace Optional["T"] with "T | None"
In Python 3.10 strings don't support the | operator, so ruff doesn't
attempt to fix these. Quote the entire type specification to avoid the
typing.Optional import.

Alternatives I considered:
- leaving it as is (only use of Optional in the code base)
- using `from future import __annotations__` (replacing one import with
  another one)
2026-02-20 15:17:32 +01:00
Štěpán Balážik
fe38515ad0 Replace Optional[T] with T | None
Generated with: ruff check --extend-select UP045 --fix && black .
2026-02-20 15:17:32 +01:00
Štěpán Balážik
cdb7428431 Remove the rest of Union usages by hand
These require some manual changes.
2026-02-20 15:17:32 +01:00
Štěpán Balážik
ce9c9a1a9c Replace Union[S, T] with S | T
Generated with: ruff check --extend-select UP007 --fix && black .
2026-02-20 15:17:32 +01:00
Štěpán Balážik
790745da18 Built-in types are now subscriptable
Generated with: ruff check --extend-select UP006 --fix
2026-02-20 15:17:32 +01:00
Štěpán Balážik
08f5e5ebd1 Remove superfluous 'pylint: disable' directives
Some of these have been fixed already, fix the rest.
2026-02-20 15:17:32 +01:00
Štěpán Balážik
b00f16f026 Remove unused imports
Generated with: ruff check --extend-select F401 --fix
2026-02-20 15:17:32 +01:00
Štěpán Balážik
7178c97e5c Set pytestmark explicitly in rollover* and nsec3* tests
Importing pytestmark confuses static analysis tools as they flag it as
unused.
2026-02-20 15:17:32 +01:00