Commit graph

5172 commits

Author SHA1 Message Date
Andoni Duarte
6cae1d10ca Merge tag 'v9.21.22'
Some checks are pending
CodeQL / Analyze (push) Waiting to run
SonarCloud / Build and analyze (push) Waiting to run
2026-05-20 10:26:28 +00:00
Ondřej Surý
965995c66a
Properly handle the return value of BN_num_bits()
BN_num_bits() returns 0 when passed NULL and a negative value on
internal error.  The OpenSSL wrappers stored the result in a size_t,
so a 0 return falsely satisfied the bit-length check and a negative
return wrapped to a huge value.  Capture the int return, reject
non-positive values, then compare against the limit.
2026-05-19 19:21:49 +02:00
Ondřej Surý
6175577210
Use SipHash-1-3 for hash tables, keep SipHash-2-4 for cookies
SipHash-2-4 was designed as a conservative PRF/MAC with extra rounds
against future attacks.  For hash tables, where outputs are never
exposed, SipHash-1-3 provides sufficient collision resistance with
fewer rounds.  As the SipHash author noted: "I would be very surprised
if SipHash-1-3 introduced weaknesses for hash tables."

DNS cookies continue to use SipHash-2-4 since cookie values are sent
on the wire and must resist online attacks.
2026-05-15 08:15:59 +02:00
Ondřej Surý
11bca1051f
Switch UDP fetches to TCP on the first response with a wrong query id
Until now, the dispatcher silently dropped UDP responses from the
expected peer that carried the wrong DNS message id and kept listening
for the correct id to arrive within the read timeout.  An off-path
attacker who knows the destination address and source port of an
outgoing fetch could exploit that quiet retry window to flood the
resolver with guessed responses; with a gigabit link the per-query
success probability grows linearly with the number of guesses that
arrive before the legitimate answer or the timeout.

Treat any such mismatch as a possible spoofing attempt and let the
resolver immediately retry the same query over TCP, the same control
path the truncation handler already uses.

Add a resolver statistics counter - exposed as 'queries retried over TCP
after a response with mismatched query id' in rndc stats and
'MismatchTCP' in the statistics channel

Assisted-by: Claude:claude-opus-4-7
2026-05-14 15:56:18 +02:00
Aydın Mercan
4d16a8c9f2
Fix use-after-free in DoH write buffer after HTTP/2 send
After the send callback completes, the UV request is freed but
the HTTP/2 socket's write buffer still points to the freed memory.
If nghttp2 subsequently needs to send frames (e.g. SETTINGS ACK),
the server_read_callback reads from the dangling buffer.

Clear the write buffer before freeing the UV request.
2026-05-07 13:32:15 +02:00
Ondřej Surý
24ac3392d9
Make isc_mem_isovermem() probabilistic
Replace the hysteretic hi_water/lo_water switch with a stochastic
check: always false below lo_water, always true at or above hi_water,
linearly ramped probability in between.  This spreads cache cleaning
across many inserts instead of triggering a thundering herd once the
hi_water mark is crossed (which causes every addrdataset to enter the
LRU purge path simultaneously and serializes lookups behind the node
write locks).

The is_overmem atomic and its stores are no longer needed and are
removed.  The existing tests that asserted specific hysteretic state
transitions are simplified to check only the deterministic boundaries.
2026-05-07 13:32:15 +02:00
Ondřej Surý
4d465f4fa5 Dispatch ratelimiter events under the lock
isc__ratelimiter_tick() and isc_ratelimiter_shutdown() each pulled
events out of rl->pending into a function-local list, dropped the
mutex, and then iterated.  ISC_LIST_APPEND leaves the link in the
LINKED state, so a concurrent isc_ratelimiter_dequeue() saw an
event as still queued, called ISC_LIST_UNLINK against rl->pending —
which patched the prev/next of the local list — and freed the
event before dispatch finished, producing either an INSIST in the
unlink macro or a use-after-free in the dispatch loop.

isc_async_run() is a non-blocking wfcq enqueue, so there is no
benefit to dropping the mutex around it.  Unlink each event and
hand it to isc_async_run() while still holding rl->lock; the
existing ISC_LINK_LINKED check in dequeue then correctly
distinguishes "still queued and cancellable" from "already taken".

Assisted-by: Claude:claude-opus-4-7
2026-04-30 10:16:32 +02:00
Ondřej Surý
6082274450 Stop isc_file_safecreate from following symlinks
The function existence-checked the target with stat() and then opened
the same path without O_NOFOLLOW, so a symlink at the target path
passed the regular-file test against the link's destination and the
open() that followed truncated and wrote through the link.
rndc-confgen -a is typically run as root and writes the keyfile under
a directory that service accounts may have write access to, so a stray
symlink there would silently redirect the truncate, fchown, and
overwrite to whatever file the link pointed at.

Switch the existence check to lstat() and use S_ISREG() so a symlink's
S_IFLNK mode is detected directly (a plain bitmask of S_IFREG matches
both, since S_IFLNK shares its high bit). Add O_NOFOLLOW to both
open() flag sets to close the lstat/open TOCTOU window. Hardening
against unexpected symlinks on intermediate path components is out of
scope.

Assisted-by: Claude:claude-opus-4-7
2026-04-29 16:56:25 +02:00
Aram Sargsyan
4ede6edc54 Remove OpenSSL memory tracking support from the ossl3.c module
OPENSSL_cleanup() in OpenSSL 4 doesn't free the memory, and that is
not compatible with BIND 9's memory leak detection code. Don't use
custom allocation/deallocation functions for OpenSSL's internal memory
management in the ossl3.c module.

See https://github.com/openssl/openssl/pull/29721
2026-04-28 14:42:40 +00:00
Aydın Mercan
48a77a4bfc don't set named curves explicitly in pre-3.0 libcrypto
The function `EC_KEY_set_asn1_flag` is deprecated in AWS-LC. Fortunately
calling it to make sure we use named curve keys is entirely unnecessary.

More information for pre-3.0 libcrypto and significant forks are as
following:

OpenSSL: Named curves were the default between 1.1.0 and 3.6.1 [1],[2]
AWS-LC: Library only supports named curves in the first place [3]
BoringSSL: Likewise with AWS-LC [4]
LibreSSL: `EC_GROUP`s are named by default [5]

[1] 86f300d385
[2] 9db6af922c
[3] a605df416b/include/openssl/ec_key.h (L442-L445)
[4] 514abb73bb/include/openssl/ec_key.h (L279-L280)
[5] c933874518/src/lib/libcrypto/ec/ec_lib.c (L94)
2026-04-28 09:28:18 +03:00
Alessio Podda
bcfa2adaa3 Add missing parenthesis to fxhash
The fxhash implementation had a missing parenthesis that caused it to
diverge from Rust's reference implementation. This commit fixes this.
2026-04-16 16:03:40 +02:00
Ondřej Surý
ad6f4e1992 Reduce memory footprint by enabling background page purging
Enable jemalloc background threads and reduce dirty page decay time from
10s to 1s so that unused memory is returned to the OS sooner.  As an
additional safety net, trigger a decay pass after every 16 MiB of frees
(rate-limited to once per second) to handle bursts that the background
thread might not catch in time.  On glibc, fall back to malloc_trim(0)
with the same volume-based trigger.
2026-04-08 16:42:19 +02:00
Aydın Mercan
2a62cd449f
include <sys/endian.h> according by checking in meson
The <sys/endian.h> header has existed in macOS since around ~26. This
causes the `htobeNN`/`htoleNN` macros to be redefined in <isc/endian.h>
in terms of <libkern/OSByteOrder.h> when other system headers include
<sys/endian.h>.

Fix this issue by using checking for the existence of <sys/endian.h> in
meson and including it according to the probe result.
2026-03-31 16:06:37 +03:00
Colin Vidal
b4abc63dfa Add ISC_LIST support for isc_netaddr_t
Add an `isc_netaddrlink_t` type wrapping an `isc_netaddr_t` and an
`ISC_LINK`. This enable to build list of `isc_netaddr_t` without
increasing the memory footprint of existing usages of `isc_netaddr_t`
(which doesn't require to be linked).
2026-03-30 20:41:13 +02:00
Alessio Podda
80be99d3ac Convert isc_statsmulti to use ISC_REFCOUNT_IMPL
Instead of using hand-rolled attach and detach function, this commit
declares the same functions through the ISC_REFCOUNT_IMPL macro.
2026-03-26 10:19:25 +01:00
Alessio Podda
ed0ecb62e4 Add low contention stats counter
In the current statistics counter implementation, the statistics are
backed by an array of counters, which are updated via atomic operations.
This leads to contention, especially on high core count
machines.

This commit introduces a new isc_statsmulti_t counter that keeps a
separate array per thread. These counters are then aggregated only when
statistics are queried, shifting work off the critical path.

These changes lead to a ~2% improvement in perflab.
2026-03-26 10:19:25 +01:00
Ondřej Surý
24951b703e Move ISC_NONSTRING from util.h to attributes.h
ISC_NONSTRING is a compiler attribute macro and belongs alongside
the other attribute definitions in attributes.h, not in util.h.
2026-03-23 11:06:28 +01:00
Ondřej Surý
0f3be0beb8 Add MOVE_OWNERSHIP() macro for transferring pointer ownership
A helper macro that returns the current value of a pointer and sets
it to NULL in one expression, useful for transferring ownership in
designated initializers.
2026-03-23 11:06:28 +01:00
Ondřej Surý
9457f4f8c5
Fix data race in RCU pointer exchange operations
The liburcu rcu_cmpxchg_pointer() uses CMM_RELAXED ordering on the CAS
failure path.  When a thread loses the CAS and gets another thread's
pointer back, reading fields through that pointer is a data race on
weakly-ordered architectures (ARM, POWER) because the failing load has
no acquire semantics.

Override rcu_cmpxchg_pointer() and rcu_xchg_pointer() to use standard
__atomic builtins with __ATOMIC_ACQ_REL (success) and __ATOMIC_ACQUIRE
(failure) ordering.  This fixes the race on all architectures and is
natively visible to ThreadSanitizer.
2026-03-19 08:10:22 +01:00
Ondřej Surý
8e240bbb5f Fix isc_buffer_init capacity mismatch in DoH data chunk callback
isc_buffer_init() is given MAX_DNS_MESSAGE_SIZE (65535) as capacity but
only h2->content_length bytes are allocated.  This makes the buffer
believe it has more space than actually allocated.  A secondary bounds
check (new_bufsize <= h2->content_length) prevents actual overflow, but
the buffer invariant is violated.

Pass h2->content_length as the capacity to match the allocation.
2026-03-18 11:39:01 +01:00
Mark Andrews
d3ffa1f007 Clear errno before calling strtol
The previous code was incorrectly clearing errno after calling
strtol but before testing the result rather than clearing it and
then calling strtol so that changes to errno can be correctly
determined.
2026-03-17 10:51:37 +11:00
Matthijs Mekking
bc1d177cc2 Fast fail a validator deadlock
We return DNS_R_NOVALIDSIG if we detected a deadlock. Then in
'validate_async_done()', this result value is used to check if we
need to fall back to insecure. As part of that we create a new fetch
but that fails because of the detected deadlock. This results in a loop
of deadlock detected, fallback to insecure, deadlock detected, ...

Add a new result value, ISC_R_DEADLOCK, and return this instead when
we have detected a deadlock. This will be treated as a generic error,
as there is no special handling for this result value.
2026-03-16 16:46:51 +00:00
Aram Sargsyan
336c523b79 OpenSSL 4 compatibility fix
Starting from OpenSSL 4 the the X509_get_subject_name() function
returns a 'const' pointer to a name instead of a regular pointer.
Duplicate the name before operating on it, then free it.
2026-03-16 10:01:18 +00:00
Ondřej Surý
2ab3d7c075
Fix missing server socket detach in TLS accept error path
When TLS creation fails in tlslisten_acceptcb(), tlssock->server
was not detached before detaching tlssock itself.
2026-03-14 13:58:32 +01:00
Ondřej Surý
3f15f2d9e5
Fix INSIST copy-paste error checking RADIX_V4 instead of RADIX_V6
The INSIST in isc_radix_insert() checks node->data[RADIX_V4] and
node->node_num[RADIX_V4] twice due to a copy-paste error, never
verifying the RADIX_V6 fields.

Fix the second pair to check RADIX_V6.
2026-03-14 11:03:31 +01:00
Ondřej Surý
f1311d2d19
Enforce isc_work enqueue loop affinity
Add a REQUIRE(isc_loop() == loop) assertion to isc_work_enqueue()
to strictly enforce that work is enqueued from the loop it is
assigned to. This loudly prohibits cross-thread queue manipulation
before it inevitably turns into a concurrency debugging nightmare.
2026-03-14 06:32:50 +01:00
Michal Nowak
239464f276
Use clang-format-22 to update formatting 2026-03-04 10:56:41 +01:00
Aram Sargsyan
7f5608206e Use standard reference counting for isc_histomulti
Use reference counting for isc_histomulti module so that it's
possible to attach/detach to/from the objects when used in the
statistics channel in the coming commits.
2026-02-26 14:00:10 +00:00
Mark Andrews
3801d0ebbf
Enforce NSEC3 record consistency
NSEC3 hashes are required to fit within a single DNS label.  Since there
are 5 bits per label byte without pad characters, the maximum hash size
is floor(63*5/8) (39 bytes).

This patch enforces this maximum length for unknown algorithms, while
strictly enforcing the exact expected digest length for known algorithms
like SHA-1.
2026-02-24 14:57:22 +01:00
Ondřej Surý
10270f6b42
Cleanup setting netmgr ports from isc_managers_create()
This is now duplicate as the default ports are already set in
isc_netmgr_create().
2026-02-20 16:37:44 +01:00
Ondřej Surý
295139f8ca
Rename isc_net_getudpportrange() to isc_net_getportrange()
This better reflects the true nature of the function as we are reading
the ephemeral port range which is not related to UDP at all.
2026-02-20 14:06:23 +01:00
Ondřej Surý
04c81b55d2
Implement IP_LOCAL_PORT_RANGE socket option for Linux
For Linux >= 6.8:

Since 2023, Linux has introduced a change to the IP_LOCAL_PORT_RANGE
socket option that eliminates the need for the random window
shifting (implemented as a fallback in the next commit).

By setting IP_LOCAL_PORT_RANGE option, we tell the kernel to use better
approach to the source port selection.

For Linux << 6.8:

This implement selecting port by random shifting range leveraging the
IP_LOCAL_PORT_RANGE socket option.  The network manager is initialized
with the ephemeral port range (on startup and on reconfig) and then for
every outgoing TCP connection, we define a custom port range (1000
ports) and then randomly shift the custom range within the system range.

This helps the kernel to reduce the search space to the custom window
between <random_offset, random_offset + 1000>.

Reference:
https://blog.cloudflare.com/linux-transport-protocol-port-selection-performance/#kernel
2026-02-20 14:06:23 +01:00
Ondřej Surý
2c48fcaeed
Improve the source port selection on Linux
Since 2015, Linux has introduced a new socket option to overcome TCP
limitations: When an application needs to force a source IP on an active
TCP socket it has to use bind(IP, port=x).  As most applications do not
want to deal with already used ports, x is often set to 0, meaning the
kernel is in charge to find an available port.  But kernel does not know
yet if this socket is going to be a listener or be connected. This
IP_BIND_ADDRESS_NO_PORT socket option ask the kernel to ignore the 0
port provided by application in bind(IP, port=0) and only remember the
given IP address. The port will be automatically chosen at connect()
time, in a way that allows sharing a source port as long as the 4-tuples
are unique.

Enable IP_BIND_ADDRESS_NO_PORT on the outgoing TCP sockets to overcome
this TCP limitation.
2026-02-20 14:06:23 +01:00
Aydın Mercan
a531f00a75
wipe hmac keys correctly pre-3.0 libcrypto
A lingering `sizeof` from the prototype era of !11094 caused the
key-wipe in `isc_hmac_key_destroy` to use `sizeof(key->len)` instead of
`key->len` for the length argument of `isc_safe_memwipe`.

This results in a buffer overflow of zero bytes in HMAC keys that are
less than 4 bytes. As such, the overflow can only be visibile in keys
that are less than 32-bits, which is beyond broken and creating such
keys are only possible in testing.

Therefore, this change is *not* a security fix since the conditions are
never reachable in any imaginable deployment scenario.

Builds that use OpenSSL >=3.0 are unaffected as the `sizeof` was only
remaining in pre-3.0 builds.
2026-02-06 14:14:43 +03:00
Aydın Mercan
19c9053a6b
use isc_ossl_wrap to generate epheremal tls keys 2026-02-02 11:50:14 +03:00
Aydın Mercan
b748651bb0
explicitly set ec points properties in pre-3.0 openssl
Generating a P-256 key in pre-3.0 wasn't explicitly using uncompressed
named curves in DNSSEC but was when generating an epheremal TLS key.
2026-02-02 11:50:14 +03:00
Aydın Mercan
251af02fe7
make generate_pkcs11_ec_key consistent with others 2026-02-02 11:50:14 +03:00
Aydın Mercan
c2f3a23a3e
expose isc__crypto_md in isc/ossl_wrap.h
This is a bit of a namespace convention violation but it fits the spirit of
this header since it is exposing OpenSSL-isms to others.

Further work is needed to make sure the exposed EVP_MD isn't needed
anymore.
2026-02-02 11:50:14 +03:00
Aydın Mercan
21f80a2bd7
make isc_ossl_wrap_ecdsa_set_deterministic consistent with style 2026-02-02 11:50:14 +03:00
Aydın Mercan
8c69fedc7c
switch away from ossl_param builders from ecdsa functions 2026-02-02 11:50:14 +03:00
Aydın Mercan
fe617aa830
set parameters in batch for rsa keygen
On top on improving readability, doing so allows us to use a uint32_t
for setting the e value, getting rid of allocating an unneccessary
BIGNUM.
2026-02-02 11:50:14 +03:00
Aydın Mercan
3bd3754994
remove libcrypto version specific code in opensslecdsa_link
Using `EVP_SIGNATURE` explicit algoritms for signatures have been added
in OpenSSL 3.4 and so is skipped for the initial OpenSSL version
specific code splitting.
2026-02-02 11:50:14 +03:00
Aydın Mercan
f4d88404e2
remove libcrypto version specific code in opensslrsa_link
Using `EVP_SIGNATURE` explicit algoritms for signatures have been added
in OpenSSL 3.4 and so is skipped for the initial OpenSSL version
specific code splitting.
2026-02-02 11:50:14 +03:00
Aydın Mercan
f21d237374
move openssl error reporting to isc/ossl_wrap
While being the best place at the time, the tlserr2result doesn't belong
inside TLS code since it is generic to OpenSSL and mostly used in the
dst interface. The newly created ossl_wrap interface is the idea place
for flushing the OpenSSL thread error queue.
2026-02-02 11:50:14 +03:00
Aydın Mercan
c4a25e633c
add openssl_wrap
The isc_ossl_wrap API is intended to separate OpenSSL version specific
code that needs to expose the libcrypto internals and keep isc_crypto
clean.
2026-02-02 11:50:14 +03:00
Aydın Mercan
5ae9b4d14c
cleanup unused header in isc/md.h
Use `isc/crypto.h` whenever needed instead.
2026-02-02 11:50:14 +03:00
Aydın Mercan
8f106f2b66
Separate isc_hmac between pre and post OpenSSL 3.0
Instead of the `EVP_MD_CTX` based functions, use either the new
`EVP_MAC` or the old `HMAC_CTX` based functions.

`EVP_MAC` is the recommended way using using MAC functions in post-3.0
while `HMAC_CTX` is used internally by `EVP_MD_CTX`, making the latter
redundant.
2026-02-02 11:50:14 +03:00
Aydın Mercan
f9ec4a1cdf
switch isc_md_type_t to a proper enum
Get rid of the OpenSSL-isms that plague the codebase where the hash type
is `EVP_MD *`

By using a proper enum, alongside the cleanup, we also get the ability
to use constants for known hash sizes instead of having a function call
every time.

`EVP_MD_CTX_get0_md` has been removed instead of being adapted since it
wasn't used anymore.
2026-02-02 11:12:55 +03:00
Aydın Mercan
35eeefb437
initial openssl version splitting
Dealing with OpenSSL has been rapidly turning into an unwieldy situation
as post-3.0 changes turn the library into a different beast.

Start treating pre and post-3.0 versions differently for easier
maintenance.
2026-02-02 11:12:53 +03:00
Mark Andrews
07610f8566 Add enum for use with isc_base64_tobuffer and isc_hex_tobuffer
This adds the following enum isc_one_or_more and isc_zero_or_more
which specify if one or more or zeror or more bytes are required
when reading the unbounded base64 / hex encoded data.
2026-01-27 23:57:34 +11:00