Add an `isc_netaddrlink_t` type wrapping an `isc_netaddr_t` and an
`ISC_LINK`. This enable to build list of `isc_netaddr_t` without
increasing the memory footprint of existing usages of `isc_netaddr_t`
(which doesn't require to be linked).
In the current statistics counter implementation, the statistics are
backed by an array of counters, which are updated via atomic operations.
This leads to contention, especially on high core count
machines.
This commit introduces a new isc_statsmulti_t counter that keeps a
separate array per thread. These counters are then aggregated only when
statistics are queried, shifting work off the critical path.
These changes lead to a ~2% improvement in perflab.
A helper macro that returns the current value of a pointer and sets
it to NULL in one expression, useful for transferring ownership in
designated initializers.
The liburcu rcu_cmpxchg_pointer() uses CMM_RELAXED ordering on the CAS
failure path. When a thread loses the CAS and gets another thread's
pointer back, reading fields through that pointer is a data race on
weakly-ordered architectures (ARM, POWER) because the failing load has
no acquire semantics.
Override rcu_cmpxchg_pointer() and rcu_xchg_pointer() to use standard
__atomic builtins with __ATOMIC_ACQ_REL (success) and __ATOMIC_ACQUIRE
(failure) ordering. This fixes the race on all architectures and is
natively visible to ThreadSanitizer.
We return DNS_R_NOVALIDSIG if we detected a deadlock. Then in
'validate_async_done()', this result value is used to check if we
need to fall back to insecure. As part of that we create a new fetch
but that fails because of the detected deadlock. This results in a loop
of deadlock detected, fallback to insecure, deadlock detected, ...
Add a new result value, ISC_R_DEADLOCK, and return this instead when
we have detected a deadlock. This will be treated as a generic error,
as there is no special handling for this result value.
Use reference counting for isc_histomulti module so that it's
possible to attach/detach to/from the objects when used in the
statistics channel in the coming commits.
NSEC3 hashes are required to fit within a single DNS label. Since there
are 5 bits per label byte without pad characters, the maximum hash size
is floor(63*5/8) (39 bytes).
This patch enforces this maximum length for unknown algorithms, while
strictly enforcing the exact expected digest length for known algorithms
like SHA-1.
For Linux >= 6.8:
Since 2023, Linux has introduced a change to the IP_LOCAL_PORT_RANGE
socket option that eliminates the need for the random window
shifting (implemented as a fallback in the next commit).
By setting IP_LOCAL_PORT_RANGE option, we tell the kernel to use better
approach to the source port selection.
For Linux << 6.8:
This implement selecting port by random shifting range leveraging the
IP_LOCAL_PORT_RANGE socket option. The network manager is initialized
with the ephemeral port range (on startup and on reconfig) and then for
every outgoing TCP connection, we define a custom port range (1000
ports) and then randomly shift the custom range within the system range.
This helps the kernel to reduce the search space to the custom window
between <random_offset, random_offset + 1000>.
Reference:
https://blog.cloudflare.com/linux-transport-protocol-port-selection-performance/#kernel
This is a bit of a namespace convention violation but it fits the spirit of
this header since it is exposing OpenSSL-isms to others.
Further work is needed to make sure the exposed EVP_MD isn't needed
anymore.
Using `EVP_SIGNATURE` explicit algoritms for signatures have been added
in OpenSSL 3.4 and so is skipped for the initial OpenSSL version
specific code splitting.
Using `EVP_SIGNATURE` explicit algoritms for signatures have been added
in OpenSSL 3.4 and so is skipped for the initial OpenSSL version
specific code splitting.
While being the best place at the time, the tlserr2result doesn't belong
inside TLS code since it is generic to OpenSSL and mostly used in the
dst interface. The newly created ossl_wrap interface is the idea place
for flushing the OpenSSL thread error queue.
Instead of the `EVP_MD_CTX` based functions, use either the new
`EVP_MAC` or the old `HMAC_CTX` based functions.
`EVP_MAC` is the recommended way using using MAC functions in post-3.0
while `HMAC_CTX` is used internally by `EVP_MD_CTX`, making the latter
redundant.
Get rid of the OpenSSL-isms that plague the codebase where the hash type
is `EVP_MD *`
By using a proper enum, alongside the cleanup, we also get the ability
to use constants for known hash sizes instead of having a function call
every time.
`EVP_MD_CTX_get0_md` has been removed instead of being adapted since it
wasn't used anymore.
Dealing with OpenSSL has been rapidly turning into an unwieldy situation
as post-3.0 changes turn the library into a different beast.
Start treating pre and post-3.0 versions differently for easier
maintenance.
This adds the following enum isc_one_or_more and isc_zero_or_more
which specify if one or more or zeror or more bytes are required
when reading the unbounded base64 / hex encoded data.
While building on uclibc this error is thrown:
In file included from ./include/dns/log.h:20,
from callbacks.c:19:
../../lib/isc/include/isc/log.h:141:9: error: unknown type name ‘off_t’
141 | off_t maximum_size;
| ^~~~~
This is due to missing include unistd.h, so let's add it on top of
isc/log.h
Signed-off-by: Giulio Benetti <giulio.benetti@benettiengineering.com>
When the CDS/CDNSKEY RRset gets updated, schedule a NOTIFY(CDS) to be
sent to the parental agent. The parental agent is published in the
parent zone as a DSYNC RRset, so first we need to figure out the
parent owner name. This is done by finding the zonecut (querying for
NS RRset until we find a postive answer).
In nsfetch_dsync, we then schedule a zone fetch for the DSYNC record
at <child-labels>._dsync.<parent-labels>. Then we queue the notify
for each target in the DSYNC records that matches the NOTIFY scheme
and CDS RRtype.
Document the way `__attribute__((__constructor__))` and
`__attribute__((__destructor__))` must be used in BIND9 libraries in
order to avoid unexpected behaviors with other third-party libraries.
previously, there were over 40 separate definitions of CHECK macros, of
which most used "goto cleanup", and the rest "goto failure" or "goto
out". there were another 10 definitions of RETERR, of which most were
identical to CHECK, but some simply returned a result code instead of
jumping to a cleanup label.
this has now been standardized throughout the code base: RETERR is for
returning an error code in the case of an error, and CHECK is for jumping
to a cleanup tag, which is now always called "cleanup". both macros are
defined in isc/util.h.
Maintain the relationship between the parent and child fetch and when
creating a new child fetch, properly check the resolution loops that
would lead to a new fetch would join one of the parent's fetch contexts.
Upstream has removed the atomics implementation of CMM_LOAD_SHARED and
CMM_STORE_SHARED as these can be used also with non-stdatomics types.
As we only use the CMM api with stdatomics types, we can restore the
previous behaviour to prevent ThreadSanitizer warnings.
Under the overmem conditions, the header could get unlinked from the
SIEVE LRU using a different path. This could lead to double-unlink
which causes assertion failure. Add a guard to ISC_SIEVE_UNLINK() to
unlink only still linked headers.
Changes introduced by 72862c2abc moved the
default configuration from within `bin/named` to a central place
`bin/includes`.
The default configuration is conditioned by several compile-time macro.
While for most of them it's fine because they are defined in the global
`config.h` file included by default to all binaries (by meson), one
specific is not defined here. `HAVE_SO_REUSEPORT_LB` was defined in
`lib/isc/include/isc/netmgr.h` which is of course not included in
`bin/includes/defaultconfig.h`.
As a result, reuseport was disabled for all platform by default, even
the supported ones. This fixes the problem by checking if reuseport is
available on the platform from meson `config.h` generation directly,
which makes `HAVE_SO_REUSEPORT_LB` available everywhere.
The sun_path field is not used anymore, and consumes over a hundred
bytes for every isc_netaddr_t object. Remove it.
As isc_netaddr_t is used in cfg_obj_t, in some huge configuration trees
(e.g., a million zones), the gain is almost 1GB of resident memory.
When the arc4random_uniform() is called on NetBSD with upper_bound that
makes no sense statistically (0 or 1), the call crashes the calling
program. Fix this by returning 0 when upper bound is < 2 as does Linux,
FreeBSD and NetBSD. (Hint: System CSPRNG should never crash.)
if a zone reload is already in progress when 'rndc reload <zone>' is
run, currently the message returned in "zone reload queued", which
is correct, but it's identical to the message returned when a reload
was *not* in progress, so the user can't easily tell what happened.
a user could reload a zone twice and not realize that only one
reload actually took place.
this has been addressed by changing the message returned to
"zone reload was already queued".
a new result code ISC_R_LOADING has been added to signal this
condition, taking the place of ISC_R_RELOAD, which was obsolete
and has been removed.
Use arc4random on platforms where available. arc4random() provides high
quality cryptographically-secure pseudo-random numbers and is generally
recommended for application use.
The uv_random() call unfortunately uses getentropy() on platforms like
MacOS, OpenBSD or NetBSD which is not recommended for application use.
It was discovered in an upcoming academic paper that a xoshiro128**
internal state can be recovered by an external 3rd party allowing to
predict UDP ports and DNS IDs in the outgoing queries. This could lead
to an attacker spoofing the DNS answers with great efficiency and
poisoning the DNS cache.
Change the internal random generator to system CSPRNG with buffering to
avoid excessive syscalls.
Thanks Omer Ben Simhon and Amit Klein of Hebrew University of Jerusalem
for responsibly reporting this to us. Very cool research!
Functions hex_decode_init(), hex_decode_char() and hex_decode_finish()
are now exposed, as well as the context hex_decode_ctx_t. They now are
respectively called isc_hex_decodeinit(), isc_hex_decodechar(),
isc_hex_decodefinish() and isc_hex_decodectx_t.
This enable to re-implement the functionality of isc_hex_decodestring()
in contextes where the input is not a NULL-terminated string, but, for
example, individual characters extracted (and avoid creating an
intermediate buffer to store them). This also enable to decode a stream
of hex characters where only hex characters are expected (i.e. no white
spaces).
Previously, the fetch contexts were stored inside rwlocked hashmap
table. This was one of the most contended places for the resolver,
especially in the cold cache situation.
Replace the locked hashmap with the lock-free hashtable from the RCU
library and protect the fetch contexts against reuse by replacing the
libisc reference counting with urcu_ref that can soft-fail in situation
where the reference count is already zero. This allows us to easily
skip re-using the fetch context if it is already in process of being
destroyed.
This is the first commit in series that aims to reduce the node locking
by replacing the single-linked list of slabtop(s) with CDS linked list.
This commit doesn't do anything else beyond replacing .next link with
the cds_list_head. RCU semantics is going to be added in the subsequent
commits.
Clang 20 is complaining about passing NULL to an argument with 'nonnull'
attribute. Mark these two functions with the same attribute to assure
that these two function also don't accept NULL as an argument.
In gcc 15, __builtin_stdc_rotate_{left,right} was added. Use these
builtins when available otherwise rewrite the ISC_ROTATE_LEFT and
ISC_ROTATE_RIGHT using _Generic.
Use C23 stdckdint.h when available and define ckd_{mul,add,sub} shims to
__builtin_{mul,add,sub}_overflow(). Require the __builtin functions
unconditionally.
Currently following __builtin functions are used:
__builtin_add_overflow
__builtin_mul_overflow
__builtin_prefetch
__builtin_sub_overflow
__builtin_unreachable
These are generally available on our supported platform, and also we use
some of these unconditionally anyway in qp.c. Thus make the support for
these functions mandatory so we fail early in the 'setup' step.
The fxhash implementation was missing a constant for 32-bit platforms.
This has been fixed. Constant for 64-bit platform was update to match
the current Rust constants.