During refactoring, a condition that prevented caching RRSIGs for
records that we already have cached NODATA records was changed in an
invalid way. This was caught later when a cached NODATA(type) +
RRSIG(type) was found in the cache and caused an assertion failure.
Fix and simplify condition that prevents adding such RRSIGs.
Formerly, we've evicted the RRSIG(type) only when we were changing
existing header from positive to negative. Move the eviction routine
for the RRSIG to a common path, so the RRSIG also gets evicted when we
are adding new negative header for a specific type.
If a `key` or `tls` is associated to an IP address inside a server-list,
only the `tls` existence in the configuration was checked. Also, if
`key` or `tls` is associated to a named server-list inside a
server-list, there was no check at all.
Add the check for making sure a `key` is defined in the configuration,
as well as the check for `key` and `tls` when used on a named
server-list.
`check.c` only checks if `remote-servers`, `primaries`, etc. are not
duplicated inside the configuration file, but does not check the
correctness of its definition. This commit fixes this by calling
`validate_remotes()` for each `remote-servers` (and other aliases),
which validates the correctness of the definition itself (this is the
same call done to validate other cases like `also-notify`, etc.).
The remote-servers clause enables the following pattern:
remote-servers a { 1.2.3.4; ... };
remote-servers b { a key foo; };
However, `check.c` was explicitly throwing an error if a `key` or `tls`
was provided after a named server-list. Remove this check, as this is a
valid use case.
When named is being reconfigured, it detaches from the old
'isc_tlsctx_cache_t' TLS context cache object and creates a
new one. This can cause an assertion failure within the
resolver when the object is destroyed while still in use,
because the resolver is using the object without getting
attached to it.
Add an attach/detach so that the 'isc_tlsctx_cache_t' doesn't
get destroyed while still being in use.
Maintain the relationship between the parent and child fetch and when
creating a new child fetch, properly check the resolution loops that
would lead to a new fetch would join one of the parent's fetch contexts.
In !9155, the QNAME minimization was changed to not leak the query type
to the parent name server. This violates RFC 9156 Section 3, step (3)
and it is not necessary. It also breaks some (weird) authoritative DNS
setups, especially when CNAMEs are involved. Also there is really no
privacy leak with query type.
Restore usage of malloc_usable_size()/malloc_size(), but this time only
for memory accounting and statistics purposes. This should reduce the
memory footprint in case of compilation without jemalloc as we don't
have to keep track of the allocated memory size ourselves.
As the fetch context reference counting was converted to userspace RCU
reference counting, the ability to debug the reference counting was
lost. Restore the debugging by adding the optional compile-time enabled
debugging output again.
The qctx_destroy() only needs to be called on allocated memory and
qctx_deinit() needs to be called always. Also remove .allocated member
from the query_ctx_t structure.
The .delegating flag was only set, but never used in the dns_qpcache.
Remove it completely together with the code that was locking the node
to set the flag if the added type was DNAME.
Upstream has removed the atomics implementation of CMM_LOAD_SHARED and
CMM_STORE_SHARED as these can be used also with non-stdatomics types.
As we only use the CMM api with stdatomics types, we can restore the
previous behaviour to prevent ThreadSanitizer warnings.
When synchronizing the secure database, we skip DNSSEC records that
BIND 9 maintains with inline-signing. We should also skip private
RDATA type records that are used to track the current state of a
zone-signing process.
now that the EDNS state is stored within dns_message_t, it's no longer
necessary to have a public API call to build an opt rdataset; we can
just have dns_message_setopt() build the opt record internally.
The new dns_message_ednsinit() and dns_message_ednsaddopt() functions
allow EDNS options to be added to a message one at a time; it is no
longer necessary to construct a full array of EDNS options and set
them all at once.
This allows us to simplify EDNS option handling code, and in the
future it wlil allow plugins to add EDNS options to existing
messages.
when merging view objects into the effective configuration, add
allow-query-cache, allow-recursion, allow-query-cache-on and
allow-recursion-on ACLs as needed to reflect the way those
options inherit from each other.
this means the effective configuration is now correct for each
view. ACLs no longer need to be corrected when applying the
configuration, and the actual effective ACL values will be
displayed in "rndc showconf" and "named-checkconf -pe".
the merging of options and defaults into the effective configuration
broke the mutual inheritance of the allow-recursion, allow-query, and
allow-query-cache ACLs, and of the allow-recursion-on and
allow-query-cache-on ACLs.
this has been corrected by adding a 'cloned' flag to the cfg_obj
structure to indicate whether it was configured explicitly or
cloned from the defaults during parsing. we can then adjust the
ACLs while configuring a view, favoring user-configured values
when they're available over cloned defaults.
currently the adjustments to the ACLs are done in configure_view();
later they'll be moved into the effective configuration and this
special handling can be removed.
Call to `streamdns_resume_processing` is asynchronous but the socket
passed as argument is not attached when scheduling the call.
While there is no reproducible way (so far) to make the socket reference
number down to 0 before `streamdns_resume_processing` is called, attach
the socket before scheduling the call. This guard against an hypothetic
case where, for some reasons, the socket refcount would reach 0, and be
freed from memory when `streamdns_resume_processing` is called.
The fctx_getaddresses() was lengthy and little bit confusing with
goto statements. Split the single function into smaller parts:
one for forwarders, one for nameservers and one for alternates.
The dns_resolver mode of operation is to resolve all the domains as it
iterates the DNS tree to fill up the cache as quickly as possible.
This commit reduces the number of outgoing queries by reducing the
number of remote fetches started for the nameserver addresses resolution
via dns_adb_createfind() to a smaller number per depth of the recursion
since the delegation point (3 2 1 0) - where 0 means only create fetch
on demand if we don't have any addresses yet.
The prefetch statement now enforces its bounds. The configuration
(including `named-checkconf`) now fails if the trigger (first value) is
above 10, or if the eligibility (second optional value) isn't at least
six seconds more than the trigger value.
Catalog-zones can't be used in view which are not from the IN class.
This is now enforced as the server won't load (instead of loading
without the catalog-zone). This configuration error is now also caught
by `named-checkconf`.
`dns_zoneflg_t` enum defined multiple possible flags for a zone, but
contains numerous holes (likely from flag removed in the past). This
fixes the holes, and use a bit-shift and decimal notation to make holes
easier to spot.
as previously mentioned in commit c65b2868ab, a cfg_obj_t
configuration tree structure takes up considerably more space than
the canonical text. since the zone configuration saved in the zone
object using dns_zone_setcfg() is only currently used for "rndc
showzone", it can be saved as text more efficiently than as an
object tree. (and, if a tree were needed, the text could be
re-parsed quickly; zone configuration text is generally small.)
Adding the query ID to the query trace message. The log is now as the
following (id is at the end):
query client=0x7f75c5017000 thread=0x7f75c6dfe680(foo.fr/A): \
client attr:0x22300, query attr:0x700, restarts:0, \
origqname:foo.fr, timer:0, authdb:0, referral:0, id:21338
This should help debugging tests, in particular to quickly get a
specific query from the logs.
Scheduling and rescheduling a zonefetch is also similar. Refactor into
zonefetch functions. This also increments and decrements the zone's
internal reference counter in the same module, which may be less
confusing when reading the code.
When looking for a signing key in select_signing_key(), the result code
indicating unsupported algorithm would abort the search. Instead, skip
such keys and continue searching for the right key.
Co-Authored-By: Aram Sargsyan <aram@isc.org>
Co-Authored-By: Petr Menšík <pemensik@redhat.com>
Under the overmem conditions, the header could get unlinked from the
SIEVE LRU using a different path. This could lead to double-unlink
which causes assertion failure. Add a guard to ISC_SIEVE_UNLINK() to
unlink only still linked headers.
When a (secondary) zone is expired, the log message `<zone> expired` is
printed and the flag `DNS_ZONEFLG_EXPIRED` is set. Change the order by
setting the expired flag first, then printing the log.
This should fixes (rare but persistent) timing-related CI error when the
EDE 24 tests expect the zone to be expired (from the log) and
immediately after request and expect an EDE 24 error. (In some rare
cases, the server was still answering the response).
Extended DNS Error 24 (Invalid Data) is returned when the server cannot
answer data for a zone it is configured for. This occurs typically when
an authoritative server does not have loaded the DB of a configured
zone, or a secondary server zone is expired.
See RFC 8914 section 4.25.
If `query_getzonedb()` finds a zone but the zone is expired it
immediately returns `DNS_R_EXPIRED` and doesn't attempt to get the zone
DB (which would be NULL in this case).
This enable caller to have a more precise reason of why getting the DB
has failed.
Introduce the `dns_zone_isexpired()` API which returns `true` when a
secondary, mirror, etc. zone is expired.
This internally use the `DNS_ZONEFLG_EXPIRED` which was already set when
the zone gets expired, but never used.
The flag `DNS_ZONEFLG_EXPIRED` is also now cleared when the expiration
time of the zone is updated and in the future.
CID 638286: Concurrent data access violations (MISSING_LOCK). This
complains about accessing "zone->notifyctx.notify_acl" without holding
the lock "dns_zone.lock". Elsewhere, reading this data does have the
lock, so it makes sense that in the getter function this must also be
so. However, the function is unused so we can just remove it.
CID 638287: Concurrent data access violations (MISSING_LOCK). This
complains about accessing "zone->locked" without holding the lock
"dns_zone.lock". I think this is a false positive as "dns__zone_lock()"
and "dns__zone_unlock() are wrappers around "LOCK_ZONE()" and
"UNLOCK_ZONE()" and where these macros were used they were only
replaced with the internal zone functions. Moreover, "zone->locked"
is only accessed in these macros (and "TRYLOCK_ZONE()" and
"LOCKED_ZONE()").
Changes introduced by 72862c2abc moved the
default configuration from within `bin/named` to a central place
`bin/includes`.
The default configuration is conditioned by several compile-time macro.
While for most of them it's fine because they are defined in the global
`config.h` file included by default to all binaries (by meson), one
specific is not defined here. `HAVE_SO_REUSEPORT_LB` was defined in
`lib/isc/include/isc/netmgr.h` which is of course not included in
`bin/includes/defaultconfig.h`.
As a result, reuseport was disabled for all platform by default, even
the supported ones. This fixes the problem by checking if reuseport is
available on the platform from meson `config.h` generation directly,
which makes `HAVE_SO_REUSEPORT_LB` available everywhere.
Move dns_notify_destroy, dns_notify_log, dns_notify_cancel,
dns_notify_queue, dns_notify_isqueued, dns_notify_find_address, and
notify related static functions over to the notify source files.