When a RRSIG for type that we already have cached NODATA record was cached due to mismatch of the records on the upstream nameservers, an assertion failure could trigger. This has been fixed.
Closes#5633
Merge branch '5633-evict-related-rrsig-when-adding-negative-header' into 'main'
See merge request isc-projects/bind9!11228
During refactoring, a condition that prevented caching RRSIGs for
records that we already have cached NODATA records was changed in an
invalid way. This was caught later when a cached NODATA(type) +
RRSIG(type) was found in the cache and caused an assertion failure.
Fix and simplify condition that prevents adding such RRSIGs.
Formerly, we've evicted the RRSIG(type) only when we were changing
existing header from positive to negative. Move the eviction routine
for the RRSIG to a common path, so the RRSIG also gets evicted when we
are adding new negative header for a specific type.
The :any:`remote-servers` clause enable the following pattern using a named ``server-list``:
remote-servers a { 1.2.3.4; ... };
remote-servers b { a key foo; };
However, such configuration was wrongly rejected, with an "unexpected token 'foo'" error. Such configuration is now accepted.
Closes#5646
Merge branch '5646-fix-named-remote-servers-key-tls' into 'main'
See merge request isc-projects/bind9!11252
If a `key` or `tls` is associated to an IP address inside a server-list,
only the `tls` existence in the configuration was checked. Also, if
`key` or `tls` is associated to a named server-list inside a
server-list, there was no check at all.
Add the check for making sure a `key` is defined in the configuration,
as well as the check for `key` and `tls` when used on a named
server-list.
`check.c` only checks if `remote-servers`, `primaries`, etc. are not
duplicated inside the configuration file, but does not check the
correctness of its definition. This commit fixes this by calling
`validate_remotes()` for each `remote-servers` (and other aliases),
which validates the correctness of the definition itself (this is the
same call done to validate other cases like `also-notify`, etc.).
Function `named_config_getipandkeylist()` processes the nested lists by
overriding the current local variable of the function, jumping back to
the beginning of the list processing. Of course, in order to go back to
the previous state and process the remaining items of the current list,
a "stack" array is used in order to put and get back the next list
element and associated values.
This makes the logic quite complex and error prone. Instead, this commit
changes the logic by recursing into the nested list (while sharing a
state between all the invocations). The processing is fundamentally
identical, but instead of "manually" handling the stack to go back to
the previous state (and process remaining elements of the current list),
takes advantage of recursion.
The following case
remote-servers foo { 10.53.0.5; };
remote-servers bar { foo key fookey; };
did not work: the `fookey` was silently ignored. No matter how `bar` was
used, the server `10.53.0.5` wouldn't be contacted using the TSIG key
`fookey`. The problem is the same the for `tls` property.
The reason of the problem was that when `named_config_getipandkeylist()`
reached a named server-list (here, `foo`), it modified the current
context in order to immediately process what is inside `foo`, but forgot
to look at the fields `key` and `tls`, to associate those with `foo`
addresses.
Fix the problem by wrapping the `key` and `tls` from the "caller" list
inside the existing `lists` struct which is used to figure out if a
list is already processed or not. That way, the `key` and `tls` values
can be read when adding the addresses of the nested list.
Even though `remote-servers` now allows using named server-list with `key`
(or `tls`), the `key` or `tls` is not used, in the context of a named
server-list, when configuring the server.
For instance,
remote-servers foo { 10.53.0.5; };
also-notify { foo key fookey; };
won't use `fookey`.
Add a system test highlighting the problem.
The remote-servers clause enables the following pattern:
remote-servers a { 1.2.3.4; ... };
remote-servers b { a key foo; };
However, `check.c` was explicitly throwing an error if a `key` or `tls`
was provided after a named server-list. Remove this check, as this is a
valid use case.
:iscman:`named` could terminate unexpectedly when reconfiguring or
reloading, and if client-side TLS transport was in use (for example,
when forwarding queries to a DoT server). This has been fixed.
Closes#5653
Merge branch '5653-tlsctx_cache-reference-bug-fix' into 'main'
See merge request isc-projects/bind9!11295
When named is being reconfigured, it detaches from the old
'isc_tlsctx_cache_t' TLS context cache object and creates a
new one. This can cause an assertion failure within the
resolver when the object is destroyed while still in use,
because the resolver is using the object without getting
attached to it.
Add an attach/detach so that the 'isc_tlsctx_cache_t' doesn't
get destroyed while still being in use.
Sometimes the loops in the resolving (e.g. to resolve or validate ns1.example.com we need to resolve ns1.example.com) were not properly detected leading to spurious 10 seconds delay. This has been fixed and such loops are properly detected.
Closes#3033, #5578
Merge branch '5578-tracker-parent-fetch' into 'main'
See merge request isc-projects/bind9!11138
Maintain the relationship between the parent and child fetch and when
creating a new child fetch, properly check the resolution loops that
would lead to a new fetch would join one of the parent's fetch contexts.
In !9155, the QNAME minimization was changed to not leak the query type
to the parent name server. This violates RFC 9156 Section 3, step (3)
and it is not necessary. It also breaks some (weird) authoritative DNS
setups, especially when CNAMEs are involved. Also there is really no
privacy leak with query type.
Closes#5661
Merge branch '5661-dont-minimize-when-QNAME-matches-original-QNAME' into 'main'
See merge request isc-projects/bind9!11293
In !9155, the QNAME minimization was changed to not leak the query type
to the parent name server. This violates RFC 9156 Section 3, step (3)
and it is not necessary. It also breaks some (weird) authoritative DNS
setups, especially when CNAMEs are involved. Also there is really no
privacy leak with query type.
Add isctest.kasp.Key.into_ta() method which convert the key into DS /
DNSKEY trust anchor for BIND config. Add a shared template
trusted.conf.j2 which can be linked to in tests to create the trust
anchor configuration from trust anchor data returned from bootstrap()
function.
This is basically a python replacement for the keyfile_to_static_ds (and
friends) from the conf.sh shell framework.
Merge branch 'nicki/pytest-add-trust-anchor-template' into 'main'
See merge request isc-projects/bind9!11201
Add isctest.kasp.Key.into_ta() method which convert the key into DS /
DNSKEY trust anchor for BIND config. Add a shared template
trusted.conf.j2 which can be linked to in tests to create the trust
anchor configuration from trust anchor data returned from bootstrap()
function.
This is basically a python replacement for the keyfile_to_static_ds (and
friends) from the conf.sh shell framework.
Previously, a DNSKEY string from keyfile was returned. This made the
function brittle for further processing, as the string would have to be
split up, concatenated, and TTL could be missing, making string indices
context-dependent.
Parse the DNSKEY rrset into a proper dnspython object and return it.
This makes the output more predictable and reliable, as all the
neccessary parsing is done by dnspython.
Link-time optimization requires close coordination between the compiler
and the linker, so not all combinations of compiler and linker support
it.
Previously, when compiling with Clang, we checked only for lld. With
this commit, we expand the list of supported linkers we check for.
Closes#5536
Merge branch '5536-more-supported-linker-ids' into 'main'
See merge request isc-projects/bind9!11022
Meson boolean options are usually configured with enabled/disabled
instead of on/off. Make things more consistent with other meson options
by renaming -Dnamed-lto=off to -Dnamed-lto=disabled.
Link-time optimization requires close coordination between the compiler
and the linker, so not all combinations of compiler and linker support
it.
Previously, when compiling with Clang, we checked only for lld. With
this commit, we expand the list of supported linkers we check for.
Restore usage of malloc_usable_size()/malloc_size(), but this time only
for memory accounting and statistics purposes. This should reduce the
memory footprint in case of compilation without jemalloc as we don't
have to keep track of the allocated memory size ourselves.
Merge branch 'ondrej/use-malloc_usable_size-when-available' into 'main'
See merge request isc-projects/bind9!11271
Restore usage of malloc_usable_size()/malloc_size(), but this time only
for memory accounting and statistics purposes. This should reduce the
memory footprint in case of compilation without jemalloc as we don't
have to keep track of the allocated memory size ourselves.
Instead of having our own implementation of memory junk filling, rely on
the jemalloc opt.junk feature (set with MALLOC_CONF="junk:true").
Merge branch 'ondrej/remove-memfill' into 'main'
See merge request isc-projects/bind9!11270
Since the filling memory with junk patterns have been removed from ISC
memory context in favor of jemalloc opt.junk option, enable the jemalloc
behaviour by default in the GitLab CI.
As the fetch context reference counting was converted to userspace RCU
reference counting, the ability to debug the reference counting was
lost. Restore the debugging by adding the optional compile-time enabled
debugging output again.
Merge branch 'ondrej/add-tracing-to-fctx-reference-counting' into 'main'
See merge request isc-projects/bind9!11230
As the fetch context reference counting was converted to userspace RCU
reference counting, the ability to debug the reference counting was
lost. Restore the debugging by adding the optional compile-time enabled
debugging output again.
The qctx_destroy() only needs to be called on allocated memory and
qctx_deinit() needs to be called always. Also remove .allocated member
from the query_ctx_t structure.
Merge branch 'ondrej/add-qctx-deinit' into 'main'
See merge request isc-projects/bind9!11273
The qctx_destroy() only needs to be called on allocated memory and
qctx_deinit() needs to be called always. Also remove .allocated member
from the query_ctx_t structure.
The .delegating flag was only set, but never used in the dns_qpcache.
Remove it completely together with the code that was locking the node
to set the flag if the added type was DNAME.
Merge branch 'ondrej/remove-delegating-from-qpcache' into 'main'
See merge request isc-projects/bind9!10980
The .delegating flag was only set, but never used in the dns_qpcache.
Remove it completely together with the code that was locking the node
to set the flag if the added type was DNAME.
Upstream has removed the atomics implementation of CMM_LOAD_SHARED and
CMM_STORE_SHARED as these can be used also with non-stdatomics types.
As we only use the CMM api with stdatomics types, we can restore the
previous behaviour to prevent ThreadSanitizer warnings.
Closes#5660
Merge branch '5660-use-atomics-for-CMM-api-with-thread-sanitizer' into 'main'
See merge request isc-projects/bind9!11288
Upstream has removed the atomics implementation of CMM_LOAD_SHARED and
CMM_STORE_SHARED as these can be used also with non-stdatomics types.
As we only use the CMM api with stdatomics types, we can restore the
previous behaviour to prevent ThreadSanitizer warnings.
There are multiple reasons for the increased amount of differences we've
been seeing lately and for the raise of the threshold:
1. Recent hardening against cache poisoning (CVE-2025-40778) have
uncovered a few edge cases where the domain can't be properly
resolved with the new protections in place, but those are issues with
upstream configuration and DNS setup.
2. The same hardening magnified some behaviour differences between 9.21
and older versions. Some misconfigured domains, which can be resolved
with BIND 9.20 and older are no longer resolvable in 9.21+. This can
be again attributed to upstream DNS misconfiguration. See #5649.
3. A change in the respdiff CI job to include timeouts in the
comparison, or rather, increasing the timeouts to resolve the
previously timed out queries, which are typically failures. With the
previous job configuration, those were omitted from comparison,
because they were timeouts. Now, there should be no timeouts, but
there is a slight increase in the amount of differences for the
threshold evaluation.
This job is testing the current BIND implementation against the latest
released version. Unless there has been a behaviour change, there should
be no difference.
In practice, there is a small number of differences caused by upstream
discrepencies. Some of those cause "upstream unstable" answers which are
excluded from the results, but statistically, some of those will end up
being detected as differences on the resolver under test.
Currently, there seems to be about 300 upstream unstable answers with
typically around 50-60 differences. Setting the threshold to 0.1 should
be stable enough to pass if there are no changes, yet sensitive enough
to detect even fairly small changes to behaviour.