Commit graph

4954 commits

Author SHA1 Message Date
Pavel Březina
93bef0ea28 mark loop as shuttingdown earlier in shutdown_cb
`shutdown_trigger_close_cb` is not called in the main loop since
queued events in the `loop->async_trigger`, including loop teardown
(shutdown_server) are processed first, before the `uv_close` callback
is executed..

In order to pass the information to the queued events, it is necessary
to set the flag earlier in the process and not wait for the `uv_close`
callback to trigger.

(cherry picked from commit 67e21d94d4)
2024-12-10 19:52:13 +00:00
Ondřej Surý
476757770b Update picohttpparser.{c,h} with upstream repository
Upstream code doesn't do regular releases, so we need to regularly
sync the code from the upstream repository.  This is synchronization up
to the commit f8d0513 from Jan 29, 2024.

(cherry picked from commit d14a76e115)
2024-12-08 12:30:07 +00:00
Matthijs Mekking
a7b291adc7 Fix nsupdate hang when processing a large update
The root cause is the fix for CVE-2024-0760 (part 3), which resets
the TCP connection on a failed send. Specifically commit
4b7c61381f stops reading on the socket
because the TCP connection is throttling.

When the tcpdns_send_cb callback thinks about restarting reading
on the socket, this fails because the socket is a client socket.
And nsupdate is a client and is using the same netmgr code.

This commit removes the requirement that the socket must be a server
socket, allowing reading on the socket again after being throttled.

(cherry picked from commit aa24b77d8b)
2024-12-06 08:31:19 +00:00
Matthijs Mekking
492f79560d Implement global limit for outgoing queries
This global limit is not reset on query restarts and is a hard limit
for any client request.

(cherry picked from commit 16b3bd1cc7)
2024-12-06 06:20:33 +00:00
Matthijs Mekking
511c86facb Implement getter function for counter limit
(cherry picked from commit ca7d487357)
2024-12-06 06:20:33 +00:00
Ondřej Surý
624ea6c57e
Move contributed DLZ modules into a separate repository
The DLZ modules are poorly maintained as we only ensure they can still
be compiled, the DLZ interface is blocking, so anything that blocks the
query to the database blocks the whole server and they should not be
used except in testing.  The DLZ interface itself should be scheduled
for removal.

(cherry picked from commit a6cce753e2)
2024-11-26 16:24:17 +01:00
Alessio Podda
0472494417 Incrementally apply AXFR transfer
Reintroduce logic to apply diffs when the number of pending tuples is
above 128. The previous strategy of accumulating all the tuples and
pushing them at the end leads to excessive memory consumption during
transfer.

This effectively reverts half of e3892805d6

(cherry picked from commit 99b4f01b33)
2024-11-26 07:17:06 +00:00
Mark Andrews
983d8a6821 Provide more visibility into configuration errors
by logging SSL_CTX_use_certificate_chain_file and
SSL_CTX_use_PrivateKey_file errors

(cherry picked from commit 9006839ed7)
2024-11-26 12:25:01 +11:00
Ondřej Surý
c22176c0f9
Remove redundant semicolons after the closing braces of functions
(cherry picked from commit 1a19ce39db)
2024-11-19 14:26:56 +01:00
Ondřej Surý
58a15d38c2
Remove redundant parentheses from the return statement
(cherry picked from commit 0258850f20)
2024-11-19 14:26:52 +01:00
Evan Hunt
b5475c9cda corrected code style errors
- add missing brackets around one-line statements
- add paretheses around return values
2024-10-18 19:31:56 +00:00
Mark Andrews
887e874e93 Fix recursive-clients 0
Setting recursive-clients 0 triggered an assertion in isc_quota_soft.
This has now been fixed.

(cherry picked from commit 840eaa628d)
2024-10-17 22:05:22 +00:00
Petr Menšík
75a50925f7 Remove unused <openssl/{hmac,engine}.h> headers from OpenSSL shims
The <openssl/{hmac,engine}.h> headers were unused and including the
<openssl/engine.h> header might cause build failure when OpenSSL
doesn't have Engines support enabled.

See https://fedoraproject.org/wiki/Changes/OpensslDeprecateEngine
2024-10-16 04:39:43 +00:00
Ondřej Surý
4b4c550cd8 Don't enable SO_REUSEADDR on outgoing UDP sockets
Currently, the outgoing UDP sockets have enabled
SO_REUSEADDR (SO_REUSEPORT on BSDs) which allows multiple UDP sockets to
bind to the same address+port.  There's one caveat though - only a
single (the last one) socket is going to receive all the incoming
traffic.  This in turn could lead to incoming DNS message matching to
invalid dns_dispatch and getting dropped.

Disable setting the SO_REUSEADDR on the outgoing UDP sockets.  This
needs to be done explicitly because `uv_udp_open()` silently enables the
option on the socket.

(cherry picked from commit eec30c33c2)
2024-10-02 12:16:58 +00:00
Ondřej Surý
5701bf9dab
Use release memory ordering when incrementing reference counter
As the relaxed memory ordering doesn't ensure any memory
synchronization, it is possible that the increment will succeed even
in the case when it should not - there is a race between
atomic_fetch_sub(..., acq_rel) and atomic_fetch_add(..., relaxed).
Only the result is consistent, but the previous value for both calls
could be same when both calls are executed at the same time.

(cherry picked from commit 88227ea665)
2024-10-02 09:09:35 +02:00
Nicki Křížek
f2fa1b7d63
Update code formatting
clang 19 was updated in the base image.

(cherry picked from commit ebb5bd9c0f)
2024-09-21 12:45:27 +02:00
Nicki Křížek
38fb8bed49 Revert "Double the number of threadpool threads"
This reverts commit 6857df20a4.

(cherry picked from commit 842abe9fbf)
2024-09-20 14:51:33 +00:00
Nicki Křížek
379d7faeac Merge tag 'v9.20.2' into bind-9.20 2024-09-18 18:06:27 +02:00
Ondřej Surý
6bff6df272
Limit the outgoing UDP send queue size
If the operating system UDP queue gets full and the outgoing UDP sending
starts to be delayed, BIND 9 could exhibit memory spikes as it tries to
enqueue all the outgoing UDP messages.  As those are not going to be
delivered anyway (as we argued when we stopped enlarging the operating
system send and receive buffers), try to send the UDP messages directly
using `uv_udp_try_send()` and if that fails, drop the outgoing UDP
message.

(cherry picked from commit b576c4c977)
2024-09-17 16:31:25 +02:00
alessio
6e42d96cf1 Do not set SO_INCOMING_CPU
We currently set SO_INCOMING_CPU incorrectly, and testing by Ondrej
shows that fixing the issue and setting affinities is worse than letting
the kernel schedule threads without constraints. So we should not set
SO_INCOMING_CPU anymore.

(cherry picked from commit 8b8149cdd2)
2024-09-16 12:57:08 +00:00
Ondřej Surý
17f23224d1 Add isc_helper API that adds 1:1 thread for each loop
Add an extra thread that can be used to offload operations that would
affect latency, but are not long-running tasks; those are handled by
isc_work API.

Each isc_loop now has matching isc_helper thread that also built on top
of uv_loop.  In fact, it matches most of the isc_loop functionality, but
only the `isc_helper_run()` asynchronous call is exposed.

(cherry picked from commit 6370e9b311)
2024-09-12 14:39:07 +00:00
Michal Nowak
0aeefb9741 Update code formatting
clang 19 was updated in the base image.

(cherry picked from commit ff69d07fed)
2024-09-11 09:33:13 +00:00
Nicki Křížek
4d8491396d Double the number of threadpool threads
Introduce this temporary workaround to reduce the impact of long-running
tasks in offload threads which can block the resolution of queries.

(cherry picked from commit 6857df20a4)
2024-09-06 14:55:38 +02:00
Ondřej Surý
5255843f9b Follow the number of CPU set by taskset/cpuset
Administrators may wish to constrain the set of cores that BIND 9 runs
on via the 'taskset', 'cpuset' or 'numactl' programs (or equivalent on
other O/S), for example to achieve higher (or more stable) performance
by more closely associating threads with individual NIC rx queues. If
the admin has used taskset, it follows that BIND ought to
automatically use the given number of CPUs rather than the system wide
count.

Co-Authored-By: Ray Bellis <ray@isc.org>
(cherry picked from commit 5a2df8caf5)
2024-09-03 13:52:10 +00:00
Ondřej Surý
619d21b57c Stop using malloc_usable_size and malloc_size
Although the nanual page of malloc_usable_size says:

    Although the excess bytes can be over‐written by the application
    without ill effects, this is not good programming practice: the
    number of excess bytes in an allocation depends on the underlying
    implementation.

it looks like the premise is broken with _FORTIFY_SOURCE=3 on newer
systems and it might return a value that causes program to stop with
"buffer overflow" detected from the _FORTIFY_SOURCE.  As we do have own
implementation that tracks the allocation size that we can use to track
the allocation size, we can stop relying on this introspection function.

Also the newer manual page for malloc_usable_size changed the NOTES to:

    The value returned by malloc_usable_size() may be greater than the
    requested size of the allocation because of various internal
    implementation details, none of which the programmer should rely on.
    This function is intended to only be used for diagnostics and
    statistics; writing to the excess memory without first calling
    realloc(3) to resize the allocation is not supported.  The returned
    value is only valid at the time of the call.

Remove usage of both malloc_usable_size() and malloc_size() to be on the
safe size and only use the internal size tracking mechanism when
jemalloc is not available.

(cherry picked from commit d61712d14e)
2024-08-26 18:27:01 +00:00
Matthijs Mekking
6f6d000103 Apply SKR bundle on rekey
When a zone has a skr structure, lookup the currently active bundle
that contains the right key and signature material.

(cherry picked from commit 63e058c29e)
2024-08-22 10:17:08 +00:00
Ondřej Surý
46069fe5c7 Use clang-format-19 to update formatting
This is purely result of running:

    git-clang-format-19 --binary clang-format-19 origin/main

(cherry picked from commit 7b756350f5)
2024-08-22 08:16:03 +00:00
Ondřej Surý
97a9e4711c Remove code to read and parse /proc/net/if_inet6 on Linux
The getifaddr() works fine for years, so we don't have to
keep the callback to parse /proc/net/if_inet6 anymore.

(cherry picked from commit 2fbf9757b8)
2024-08-19 11:49:56 +00:00
Ondřej Surý
2a0454f881 Ignore errno returned from rewind() in the interface iterator
The clang-scan 19 has reported that we are ignoring errno after the call
to rewind().  As we don't really care about the result, just silence the
error, the whole code will be removed in the development version anyway
as it is not needed.

(cherry picked from commit dda5ba53df)
2024-08-19 11:49:56 +00:00
Ondřej Surý
530f1dd913 Check the result of dirfd() before calling unlinkat()
Instead of directly using the result of dirfd() in the unlinkat() call,
check whether the returned file descriptor is actually valid.  That
doesn't really change the logic as the unlinkat() would fail with
invalid descriptor anyway, but this is cleaner and will report the right
error returned directly by dirfd() instead of EBADF from unlinkat().

(cherry picked from commit 59f4fdebc0)
2024-08-19 10:03:08 +00:00
Ondřej Surý
dc4c0397eb Use constexpr for NS_PER_SEC and friends constants
The contexpr introduced in C23 standard makes perfect sense to be used
instead of preprocessor macros - the symbols are kept, etc.  Define
ISC_CONSTEXPR to be `constexpr` for C23 and `static const` for the older
C standards.  Use the newly introduced macro for the NS_PER_SEC and
friends time constants.

(cherry picked from commit 122a142241)
2024-08-19 09:10:04 +00:00
Ondřej Surý
27a7647559 Change the NS_PER_SEC (and friends) from enum to static const
New version of clang (19) has introduced a stricter checks when mixing
integer (and float types) with enums.  In this case, we used enum {}
as C17 doesn't have constexpr yet.  Change the time conversion constants
to be static const unsigned int instead of enum values.

(cherry picked from commit b03e90e0d4)
2024-08-19 09:10:04 +00:00
Aram Sargsyan
864d55081e Check if logconfig is NULL before using it in isc_log_doit()
Check if 'lctx->logconfig' is NULL before using it in isc_log_doit(),
because it's possible that isc_log_destroy() was already called, e.g.
when a 'call_rcu' function wants to log a message during shutdown.

(cherry picked from commit 656e04f48a)
2024-08-15 14:27:29 +00:00
Ondřej Surý
14302330f4 Skip already rehashed positions in the old hashmap table
When iterating through the old internal hashmap table, skip all the
nodes that have been already migrated to the new table.  We know that
all positions with index less than .hiter are NULL.

(cherry picked from commit 3e4d153453)
2024-08-15 12:09:28 +00:00
Ondřej Surý
61b88c56cd Fix the assertion failure in the isc_hashmap iterator
When the round robin hashing reorders the map entries on deletion, we
were adjusting the iterator table size only when the reordering was
happening at the internal table boundary.  The iterator table size had
to be reduced by one to prevent seeing the entry that resized on
position [0] twice because it migrated to [iter->size - 1] position.

However, the same thing could happen when the same entry migrates a
second time from [iter->size - 1] to [iter->size - 2] position (and so
on) because the check that we are manipulating the entry just in the [0]
position was insufficient.  Instead of checking the position [pos == 0],
we now check that the [pos % iter->size == 0], thus ignoring all the
entries that might have moved back to the end of the internal table.

(cherry picked from commit acdc57259f)
2024-08-15 12:09:28 +00:00
Ondřej Surý
bbf34c0604 Disassociate the SSL object from the cached SSL_SESSION
When the SSL object was destroyed, it would invalidate all SSL_SESSION
objects including the cached, but not yet used, TLS session objects.

Properly disassociate the SSL object from the SSL_SESSION before we
store it in the TLS session cache, so we can later destroy it without
invalidating the cached TLS sessions.

Co-authored-by: Ondřej Surý <ondrej@isc.org>
Co-authored-by: Artem Boldariev <artem@isc.org>
Co-authored-by: Aram Sargsyan <aram@isc.org>
(cherry picked from commit c11b736e44)
2024-08-07 15:25:29 +00:00
Ondřej Surý
c6daaa4b8c Attach/detach to the listening child socket when accepting TLS
When TLS connection (TLSstream) connection was accepted, the children
listening socket was not attached to sock->server and thus it could have
been freed before all the accepted connections were actually closed.

In turn, this would cause us to call isc_tls_free() too soon - causing
cascade errors in pending SSL_read_ex() in the accepted connections.

Properly attach and detach the children listening socket when accepting
and closing the server connections.

(cherry picked from commit 684f3eb8e6)
2024-08-07 15:16:50 +00:00
Ondřej Surý
b0ba2b72e6 Call rcu_barrier() in the isc_mem_destroy() just once
The previous work in this area was led by the belief that we might be
calling call_rcu() from within call_rcu() callbacks.  After carefully
checking all the current callback, it became evident that this is not
the case and the problem isn't enough rcu_barrier() calls, but something
entirely else.

Call the rcu_barrier() just once as that's enough and the multiple
rcu_barrier() calls will not hide the real problem anymore, so we can
find it.

(cherry picked from commit 13941c8ca7)
2024-08-05 11:39:30 +00:00
Ondřej Surý
506138ec0f Fix the assertion failure when putting 48-bit number to buffer
When putting the 48-bit number into a fixed-size buffer that's exactly 6
bytes, the assertion failure would occur as the 48-bit number is
internally represented as 64-bit number and the code was checking if
there is enough space for `sizeof(val)`.  This causes assertion failure
when otherwise valid TSIG signature has a bad timing information.

Specify the size of the argument explicitly, so the 48-bit number
doesn't require 8-byte long buffer.

(cherry picked from commit 37dbd57c16)
2024-08-05 11:11:40 +00:00
Ondřej Surý
80738e98bd Fix PTHREAD_MUTEX_ADAPTIVE_NP and PTHREAD_MUTEX_ERRORCHECK_NP usage
The PTHREAD_MUTEX_ADAPTIVE_NP and PTHREAD_MUTEX_ERRORCHECK_NP are
usually not defines, but enum values, so simple preprocessor check
doesn't work.

Check for PTHREAD_MUTEX_ADAPTIVE_NP from the autoconf AS_COMPILE_IFELSE
block and define HAVE_PTHREAD_MUTEX_ADAPTIVE_NP.  This should enable
adaptive mutex on Linux and FreeBSD.

As PTHREAD_MUTEX_ERRORCHECK actually comes from POSIX and Linux glibc
does define it when compatibility macros are being set, we can just use
PTHREAD_MUTEX_ERRORCHECK instead of PTHREAD_MUTEX_ERRORCHECK_NP.

(cherry picked from commit cc4f99bc6d)
2024-08-05 09:13:07 +00:00
Ondřej Surý
5d76ef21f0 Remove ISC_MUTEX_INITIALIZER
It's hard to get it right on different platforms and it's unused
in BIND 9 anyway.

(cherry picked from commit f158884344)
2024-08-05 09:13:07 +00:00
Mark Andrews
fbcdfefd2d Properly compute the physical memory size
On a 32 bit machine casting to size_t can still lead to an overflow.
Cast to uint64_t.  Also detect all possible negative values for
pages and pagesize to silence warning about possible negative value.

    39#if defined(_SC_PHYS_PAGES) && defined(_SC_PAGESIZE)
    	1. tainted_data_return: Called function sysconf(_SC_PHYS_PAGES),
           and a possible return value may be less than zero.
    	2. assign: Assigning: pages = sysconf(_SC_PHYS_PAGES).
    40        long pages = sysconf(_SC_PHYS_PAGES);
    41        long pagesize = sysconf(_SC_PAGESIZE);
    42
    	3. Condition pages == -1, taking false branch.
    	4. Condition pagesize == -1, taking false branch.
    43        if (pages == -1 || pagesize == -1) {
    44                return (0);
    45        }
    46
    	5. overflow: The expression (size_t)pages * pagesize might be negative,
           but is used in a context that treats it as unsigned.

    CID 498034: (#1 of 1): Overflowed return value (INTEGER_OVERFLOW)
    6. return_overflow: (size_t)pages * pagesize, which might have underflowed,
       is returned from the function.
    47        return ((size_t)pages * pagesize);
    48#endif /* if defined(_SC_PHYS_PAGES) && defined(_SC_PAGESIZE) */

(cherry picked from commit e8dbc5db92)
2024-07-31 07:30:35 +00:00
Artem Boldariev
5781ff3a93 Drop expired but not accepted TCP connections
This commit ensures that we are not attempting to accept an expired
TCP connection as we are not interested in any data that could have
been accumulated in its internal buffers. Now we just drop them for
good.
2024-07-03 15:03:02 +03:00
Ondřej Surý
bc3e713317
Throttle the reading when writes are asynchronous
Be more aggressive when throttling the reading - when we can't send the
outgoing TCP synchronously with uv_try_write(), we start throttling the
reading immediately instead of waiting for the send buffers to fill up.

This should not affect behaved clients that read the data from the TCP
on the other end.
2024-07-03 08:45:39 +02:00
Artem Boldariev
55b1a093ea
Do not un-throttle TCP connections on isc_nm_read()
Due to omission it was possible to un-throttle a TCP connection
previously throttled due to the peer not reading back data we are
sending.

In particular, that affected DoH code, but it could also affect other
transports (the current or future ones) that pause/resume reading
according to its internal state.
2024-06-12 13:44:37 +03:00
Ondřej Surý
4c2ac25a95
Limit the number of DNS message processed from a single TCP read
The single TCP read can create as much as 64k divided by the minimum
size of the DNS message.  This can clog the processing thread and trash
the memory allocator because we need to do as much as ~20k allocations in
a single UV loop tick.

Limit the number of the DNS messages processed in a single UV loop tick
to just single DNS message and limit the number of the outstanding DNS
messages back to 23.  This effectively limits the number of pipelined
DNS messages to that number (this is the limit we already had before).
2024-06-10 16:48:54 +02:00
Ondřej Surý
4e7c4af17f
Throttle reading from TCP if the sends are not getting through
When TCP client would not read the DNS message sent to them, the TCP
sends inside named would accumulate and cause degradation of the
service.  Throttle the reading from the TCP socket when we accumulate
enough DNS data to be sent.  Currently this is limited in a way that a
single largest possible DNS message can fit into the buffer.
2024-06-10 16:48:52 +02:00
Artem Boldariev
d80dfbf745
Keep the endpoints set reference within an HTTP/2 socket
This commit ensures that an HTTP endpoints set reference is stored in
a socket object associated with an HTTP/2 stream instead of
referencing the global set stored inside a listener.

This helps to prevent an issue like follows:

1. BIND is configured to serve DoH clients;
2. A client is connected and one or more HTTP/2 stream is
created. Internal pointers are now pointing to the data on the
associated HTTP endpoints set;
3. BIND is reconfigured - the new endpoints set object is created and
promoted to all listeners;
4. The old pointers to the HTTP endpoints set data are now invalid.

Instead referencing a global object that is updated on
re-configurations we now store a local reference which prevents the
endpoints set objects to go out of scope prematurely.
2024-06-10 16:40:12 +02:00
Artem Boldariev
c41fb499b9
DoH: avoid potential use after free for HTTP/2 session objects
It was reported that HTTP/2 session might get closed or even deleted
before all async. processing has been completed.

This commit addresses that: now we are avoiding using the object when
we do not need it or specifically check if the pointers used are not
'NULL' and by ensuring that there is at least one reference to the
session object while we are doing incoming data processing.

This commit makes the code more resilient to such issues in the
future.
2024-06-10 16:40:10 +02:00
Ondřej Surý
a9b4d42346 Add isc_queue implementation on top of cds_wfcq
Add an isc_queue implementation that hides the gory details of cds_wfcq
into more neat API.  The same caveats as with cds_wfcq.

TODO: Add documentation to the API.
2024-06-05 09:19:56 +02:00