Commit graph

16472 commits

Author SHA1 Message Date
Ondřej Surý
8e240bbb5f Fix isc_buffer_init capacity mismatch in DoH data chunk callback
isc_buffer_init() is given MAX_DNS_MESSAGE_SIZE (65535) as capacity but
only h2->content_length bytes are allocated.  This makes the buffer
believe it has more space than actually allocated.  A secondary bounds
check (new_bufsize <= h2->content_length) prevents actual overflow, but
the buffer invariant is violated.

Pass h2->content_length as the capacity to match the allocation.
2026-03-18 11:39:01 +01:00
Ondřej Surý
7f8b972a3d
Remove NZF support, make LMDB required for new zone storage
Drop the NZF (New Zone File) fallback for persisting runtime zone
configurations, making LMDB (NZD) the only storage backend. This
removes all #ifdef HAVE_LMDB conditionals, the meson 'lmdb' option,
and the NZF-related functions. LMDB is now a mandatory build
dependency.

The named-nzd2nzf tool is now always built.
2026-03-18 11:02:33 +01:00
Ondřej Surý
5b1750f15f Fix missing mutex destroy and ede invalidate on fctx_create() error paths
The error cleanup in fctx_create() was missing isc_mutex_destroy() and
dns_ede_invalidate() calls. When error paths (cleanup_nameservers,
cleanup_fcount, cleanup_qmessage, cleanup_adb) were taken after the
mutex and edectx were initialized, the fctx memory was freed without
properly destroying these resources first.
2026-03-17 16:05:11 +01:00
Ondřej Surý
96a22451d7 Fix rwlock type mismatch in delete_ds() error path
The lock is acquired for reading but the error path from
dns_rdata_fromstruct() incorrectly unlocks it as a write lock.
2026-03-17 16:05:11 +01:00
Mark Andrews
d3ffa1f007 Clear errno before calling strtol
The previous code was incorrectly clearing errno after calling
strtol but before testing the result rather than clearing it and
then calling strtol so that changes to errno can be correctly
determined.
2026-03-17 10:51:37 +11:00
Matthijs Mekking
bc1d177cc2 Fast fail a validator deadlock
We return DNS_R_NOVALIDSIG if we detected a deadlock. Then in
'validate_async_done()', this result value is used to check if we
need to fall back to insecure. As part of that we create a new fetch
but that fails because of the detected deadlock. This results in a loop
of deadlock detected, fallback to insecure, deadlock detected, ...

Add a new result value, ISC_R_DEADLOCK, and return this instead when
we have detected a deadlock. This will be treated as a generic error,
as there is no special handling for this result value.
2026-03-16 16:46:51 +00:00
Ondřej Surý
6e286beaa6 Cleanup weird syntax defining struct dns_ixfr
The struct dns_ixfr was defined as part of struct dns_xfrin, probably
because at some point it was an anonymous struct and then it was changed
to named struct with typedef at the top.  Move the definition from
struct dns_xfrin into and fold into the typedef ... dns_ixfr_t.
2026-03-16 12:17:06 +01:00
Ondřej Surý
f4b4f030c4 Cleanup the duplicate logic and comments around add into NSEC tree
After merging the NORMAL, NSEC and NSEC3 tree into single QP tree, there
were some comments still speaking about auxiliary NSEC tree.  These were
cleaned up and the logic when we pass the qp tree (write transaction) to
qpzone_addrdataset_inner() was changed to be more obvious that this is
needed only when we are adding NSEC records.
2026-03-16 12:17:06 +01:00
Ondřej Surý
e57245ee81
Fix use-after-free in xfrin_recv_done
Move the LIBDNS_XFRIN_RECV_DONE probe execution before dns_xfrin_detach
in xfrin_recv_done.

Previously, dns_xfrin_detach was called before the trace probe, which
could free the xfr object.  Because the accessed member xfr->info is an
embedded array, the expression evaluates via pointer arithmetic rather
than a direct memory dereference.  Although this prevents a reliable
crash in practice, it technically remains a use-after-free issue.
Reorder the statements to ensure the transfer context is fully valid
when the probe executes.
2026-03-16 11:06:06 +01:00
Aram Sargsyan
336c523b79 OpenSSL 4 compatibility fix
Starting from OpenSSL 4 the the X509_get_subject_name() function
returns a 'const' pointer to a name instead of a regular pointer.
Duplicate the name before operating on it, then free it.
2026-03-16 10:01:18 +00:00
Ondřej Surý
df1993611b
Fix KASP key leaks on keystore lookup failure
In both cfg_kasp_fromconfig() and cfg_kasp_builtinconfig(), the
newly allocated KASP key was not destroyed when the keystore
lookup failed.
2026-03-14 13:58:32 +01:00
Ondřej Surý
2ab3d7c075
Fix missing server socket detach in TLS accept error path
When TLS creation fails in tlslisten_acceptcb(), tlssock->server
was not detached before detaching tlssock itself.
2026-03-14 13:58:32 +01:00
Ondřej Surý
63d3c1f58a
Simplify checkds_create() to return void
Since memory allocation never fails in BIND 9, checkds_create() cannot
fail.  Change it to return void and use designated initializers,
removing error handling at all call sites.
2026-03-14 13:58:26 +01:00
Ondřej Surý
d7e1013741
Fix cb_args memory leak in ns_query() error path
Initialize cb_args to NULL and free it in the cleanup path so it
is not leaked when the function fails after allocation.
2026-03-14 13:48:08 +01:00
Ondřej Surý
1505cb1c24
Fix TSIG key and transport leaks in zone_notify() error paths
Two 'goto next' paths in zone_notify() skipped detaching the TSIG
key and transport, leaking them on TLS configuration failure and
when the destination address is disabled.
2026-03-14 13:48:08 +01:00
Ondřej Surý
80fae7a4b7
Fix memory leak in ixfr_commit() error path
The 'data' allocation was not freed when reaching the cleanup
label with an error result.
2026-03-14 13:48:08 +01:00
Ondřej Surý
d0165070c7
Fix memory context leak in dns_client_resolve() error path
Use isc_mem_putanddetach() instead of isc_mem_put() to properly
detach the attached memory context stored in resarg->mctx.
2026-03-14 13:47:48 +01:00
Aram Sargsyan
4df5b9ac32
Fix a bug in rpz.c:del_name()
When the dns_qp_getname() call returns an error the del_name() function
just returns without cleaning up the trasnaction.

Instead of returning, jump to a new label 'done:' similar to the code
written in the add_nm() function.
2026-03-14 13:01:55 +01:00
Ondřej Surý
3f15f2d9e5
Fix INSIST copy-paste error checking RADIX_V4 instead of RADIX_V6
The INSIST in isc_radix_insert() checks node->data[RADIX_V4] and
node->node_num[RADIX_V4] twice due to a copy-paste error, never
verifying the RADIX_V6 fields.

Fix the second pair to check RADIX_V6.
2026-03-14 11:03:31 +01:00
Ondřej Surý
66ce33603b
Fix port validation rejecting valid port 65535
A few port validation checks use >= UINT16_MAX instead of > UINT16_MAX,
incorrectly rejecting port 65535 as out of range.  Port 65535 is a valid
TCP/UDP port number.  Other port checks in the same file already use the
correct > comparison.
2026-03-14 10:11:55 +01:00
Ondřej Surý
5cd17c8adc
Fix memory leak in dns_catz_options_setdefault() for zonedir
When defaults->zonedir is set, opts->zonedir is unconditionally
overwritten without freeing the previous value. This leaks memory
on every catalog zone update when zonedir defaults are configured.

Free the existing opts->zonedir before replacing it.
2026-03-14 07:57:00 +01:00
Ondřej Surý
e7c550730a
Dispatch async work jobs from the correct loop
Refactor dns_loadctx_t and dns_dumpctx_t to use standard
ISC_REFCOUNT_DECL and ISC_REFCOUNT_IMPL macros, retiring the
redundant manual attach and detach implementations.

Introduce dns_loadctx_enqueue() and dns_dumpctx_enqueue() to
ensure compliance with the new strict loop affinity in
isc_work_enqueue(). If the current loop does not match the
target loop, the enqueue operation is safely bounced to the
correct thread via isc_async_run().
2026-03-14 06:32:54 +01:00
Ondřej Surý
f1311d2d19
Enforce isc_work enqueue loop affinity
Add a REQUIRE(isc_loop() == loop) assertion to isc_work_enqueue()
to strictly enforce that work is enqueued from the loop it is
assigned to. This loudly prohibits cross-thread queue manipulation
before it inevitably turns into a concurrency debugging nightmare.
2026-03-14 06:32:50 +01:00
Mark Andrews
cfa21d1e8b Set length in dns_rdata_in_dhcid structure
tostruct_in_dhcid was not setting the length field in the
dns_rdata_in_dhcid structure.
2026-03-12 14:08:32 +11:00
Ondřej Surý
2da669490c Fix resquery reference imbalance on TCP connect failure
In fctx_query(), resquery_ref(query) is called before
dns_dispatch_connect() in anticipation of the resquery_connected()
callback consuming the reference.

When dns_dispatch_connect() fails synchronously on TCP (e.g. from
dns_transport_get_tlsctx() failing in tcp_dispatch_connect()), the
connect callback is never scheduled, so the extra reference is never
consumed.  The error path then tears down the query via manual cleanup
(isc_mem_put) without going through the refcount destructor, leaving
the reference imbalanced.

Fix by dropping the extra reference on the error path, just after
dns_dispatch_done() which cleans up the dispatch entry.
2026-03-10 17:58:43 +01:00
Ondřej Surý
0d28e1bed2 Fix copy-paste typos in dns_dispatchmgr comments
The v6ports and nv6ports fields are documented as "available ports
for IPv4" instead of "IPv6".
2026-03-10 17:58:43 +01:00
Alessio Podda
547c280002 Replace lock keyfile hashmap with lock pool
Kasp used a lock per zone origin in order to prevent concurrent access
to keyfiles. This lead to substantial memory consumption in the case of
authoritative servers with many small zones, as lots of locks need to be
allocated.

Since the number of keyfile locks taken cannot exceed the number of
helper threads, it makes more sense to use a lock pool of fixed size
keyed by the hash of the origin name, leading to memory savings.
2026-03-06 12:31:24 +01:00
Mark Andrews
05c69f4103 Fix setting retire in dns_keymgr_key_init
The wrong variable was passed to dst_key_gettime when attempting
to set retire.
2026-03-05 10:14:45 +00:00
Ondřej Surý
c1ba80169c
Introduce max-delegation-servers configuration option
Make the maximum number of processed delegation nameservers configurable
via the new 'max-delegation-servers' option (default: 13), replacing the
hardcoded NS_PROCESSING_LIMIT (20).

The default is reduced to 13 to precisely match the maximum number of
root servers that can fit into a classic 512-byte UDP payload.  This
provides a natural, historically sound cap that mitigates resource
exhaustion and amplification attacks from artificially inflated or
misconfigured delegations.

The configuration option is strictly bounded between 1 and 100 to ensure
resolver stability.
2026-03-04 16:13:49 +01:00
Michal Nowak
239464f276
Use clang-format-22 to update formatting 2026-03-04 10:56:41 +01:00
Colin Vidal
b90399ebdc checkconf: check key existence in views
Commit `2956e4fc45b3c2142a3351682d4200647448f193` hardened the `key`
name check when used in `primaries` to reject the configuration if
the key was not defined, rather than simply checking whether the
key name was correctly formed.

However, the key name check didn't include the view configuration,
causing keys not to be recognized if they were defined inside the
view and not at the global level.  This regression is now fixed.
2026-02-28 17:08:42 -08:00
Aram Sargsyan
450a4189b3 Calculate incoming queries RTT statistics
In client_senddone() update the RTT statistics for the incoming
queries.
2026-02-26 14:00:10 +00:00
Aram Sargsyan
393f932dbf Keep client->inner.tnow and client->inner.now in sync
The incoming queries RTT statistics are going to need correct time
information for calculations.
2026-02-26 14:00:10 +00:00
Aram Sargsyan
e41fbea843 Replace the outgoing queries RTT histogram code with isc_histomulti
The granularity of the simple histogram with fixed number of ranges
sometimes isn't good enough. As there's a need to implement a new
histogram statistics for the incoming query times (RTT), it was decided
to also update the existing RTT statistics of the outgoing queries
so that they look similar and use common code.

Remove the old histogram code from the resolver and from the statistics
channel. Reimplement the outgoing queries RTT histogram using the
isc_histomulti module, and prepare the necessary base for implementing
the incoming queries RTT histogram. The statistics channel will be
updated to expose the new histograms in an upcoming commit.
2026-02-26 14:00:10 +00:00
Aram Sargsyan
7f5608206e Use standard reference counting for isc_histomulti
Use reference counting for isc_histomulti module so that it's
possible to attach/detach to/from the objects when used in the
statistics channel in the coming commits.
2026-02-26 14:00:10 +00:00
Ondřej Surý
3c33e7d937
Implement Fisher-Yates shuffle for nameserver selection
Replace the two-pass "random start index and wrap around" logic in
fctx_getaddresses_nameservers() with a statistically sound Fisher-Yates
shuffle.

The previous implementation picked a random starting node and did two
passes over the linked list to find query candidates.  The new logic
extracts the available nameservers into a bounded, stack-allocated array
of dns_rdata_t structures.

This array is then randomized in-place using a Fisher-Yates shuffle.
Finally, the shuffled array is traversed sequentially to launch fetches
until the dynamic quota (fctx->pending_running >= fetches_allowed) is
reached.

This guarantees a fair random distribution for outbound queries while
properly respecting dynamic query limits, entirely within O(1) memory
and without the overhead of linked-list pointer shuffling or multiple
dataset traversals.
2026-02-26 06:57:53 +01:00
Matthijs Mekking
5bd6322739 Fix log level bug in keystore
A debug message that logs a PKCS#11 object has been generated was
erroneously logged at error level. This has been fixed.
2026-02-25 11:34:07 +01:00
Mark Andrews
b78052119a Remove determinist selection of nameserver
When selecting nameserver addresses to be looked up we where
always selecting them in dnssec name order from the start of
the nameserver rrset.  This could lead to resolution failure
despite there being address that could be resolved for the
other names.  Use a random starting point when selecting which
names to lookup.
2026-02-25 09:27:03 +01:00
Ondřej Surý
46cfac0825
Remove purged adb names and entries from SIEVE list immediately
Both `expire_name()` and `expire_entry()` use the isc_async mechanism to
remove names and entries from the SIEVE-LRU lists on the matching
isc_loop.

Under heavy load when the cleaning mechanism didn't have the chance to
kick in yet, this delay could lead to double-counting the purged names
and entries when purging the SIEVE-LRU lists during an overmem
condition.  This would result in insufficient memory being cleaned up,
causing the ADB to never recover from the overmem condition and leading
to an OOM crash of `named`.

This patch resolves the issue by bypassing the async queue and executing
the removal synchronously if the target loop matches the current
isc_loop().
2026-02-25 07:26:38 +01:00
Ondřej Surý
8ab4827a0c Importing invalid SKR file might overflow the stack buffer
If an invalid SKR file is imported, reading the time from the token
buffer might overflow the buffer on the local stack.  This has been
fixed by removing the intermediate buffer and parsing the lexer token
directly.
2026-02-24 19:44:57 +01:00
Mark Andrews
f030bc6756
Remove invalid REQUIRE in NSEC3 fromstruct method
The NSEC3 fromstruct method only worked for hash type 1
when it should work for all hash types.
2026-02-24 14:58:18 +01:00
Mark Andrews
3801d0ebbf
Enforce NSEC3 record consistency
NSEC3 hashes are required to fit within a single DNS label.  Since there
are 5 bits per label byte without pad characters, the maximum hash size
is floor(63*5/8) (39 bytes).

This patch enforces this maximum length for unknown algorithms, while
strictly enforcing the exact expected digest length for known algorithms
like SHA-1.
2026-02-24 14:57:22 +01:00
Ondřej Surý
67b4fb56e4
Invalid NSEC3 can cause OOB read of the isdelegation() stack
When .next_length is longer than NSEC3_MAX_HASH_LENGTH, it causes a
harmless out-of-bound read of the isdelegation() stack.  This patch
fixes the issue by skipping NSEC3 records with an oversized hash length
during validation.
2026-02-24 14:56:29 +01:00
Ondřej Surý
f983a64152
Fail DNSKEY validation when supported but invalid DS is found
A regression was introduced when adding the EDE code for unsupported
DNSKEY and DS algorithms.  When the parent has both supported and
unsupported algorithm in the DS record, the validator would treat the
supported DS algorithm as insecure when validating DNSKEY records
instead of BOGUS.  This has not security impact as the rest of the child
zone correctly ends with BOGUS status, but it is incorrect and thus the
regression has been fixed.
2026-02-23 11:34:43 +01:00
Ondřej Surý
d46277b398 Clear serve-stale flags when following the CNAME chains
A stale answer or SERVFAIL could have been served in case of multiple
upstream failures when following the CNAME chains. This has been fixed.
2026-02-23 08:07:12 +01:00
Ondřej Surý
10270f6b42
Cleanup setting netmgr ports from isc_managers_create()
This is now duplicate as the default ports are already set in
isc_netmgr_create().
2026-02-20 16:37:44 +01:00
Ondřej Surý
295139f8ca
Rename isc_net_getudpportrange() to isc_net_getportrange()
This better reflects the true nature of the function as we are reading
the ephemeral port range which is not related to UDP at all.
2026-02-20 14:06:23 +01:00
Ondřej Surý
04c81b55d2
Implement IP_LOCAL_PORT_RANGE socket option for Linux
For Linux >= 6.8:

Since 2023, Linux has introduced a change to the IP_LOCAL_PORT_RANGE
socket option that eliminates the need for the random window
shifting (implemented as a fallback in the next commit).

By setting IP_LOCAL_PORT_RANGE option, we tell the kernel to use better
approach to the source port selection.

For Linux << 6.8:

This implement selecting port by random shifting range leveraging the
IP_LOCAL_PORT_RANGE socket option.  The network manager is initialized
with the ephemeral port range (on startup and on reconfig) and then for
every outgoing TCP connection, we define a custom port range (1000
ports) and then randomly shift the custom range within the system range.

This helps the kernel to reduce the search space to the custom window
between <random_offset, random_offset + 1000>.

Reference:
https://blog.cloudflare.com/linux-transport-protocol-port-selection-performance/#kernel
2026-02-20 14:06:23 +01:00
Ondřej Surý
2c48fcaeed
Improve the source port selection on Linux
Since 2015, Linux has introduced a new socket option to overcome TCP
limitations: When an application needs to force a source IP on an active
TCP socket it has to use bind(IP, port=x).  As most applications do not
want to deal with already used ports, x is often set to 0, meaning the
kernel is in charge to find an available port.  But kernel does not know
yet if this socket is going to be a listener or be connected. This
IP_BIND_ADDRESS_NO_PORT socket option ask the kernel to ignore the 0
port provided by application in bind(IP, port=0) and only remember the
given IP address. The port will be automatically chosen at connect()
time, in a way that allows sharing a source port as long as the 4-tuples
are unique.

Enable IP_BIND_ADDRESS_NO_PORT on the outgoing TCP sockets to overcome
this TCP limitation.
2026-02-20 14:06:23 +01:00
Ondřej Surý
c3ec414d88
Remove return value from isc_net_getudpportrange()
The function was already marked as never failing, always returning
ISC_R_SUCCESS, so there was a lot of dead code around checking whether
the result would be ISC_R_SUCCESS.  This has been cleaned up.
2026-02-20 14:06:23 +01:00