Commit graph

14519 commits

Author SHA1 Message Date
Artem Boldariev
d0907a3a1f TLS DNS: Simplify tls_cycle_input()
This commit simplifies code flow in the tls_cycle_input() and makes
the incoming data processing similar to that in TCP DNS. In
particular, now we decipher all the the incoming data before making a
single isc__nm_process_sock_buffer() call. Previously we would try to
decipher data bit-by-bit before trying to process the deciphered bit
via isc__nm_process_sock_buffer(). Doing like before made the code
much less predictable, in particular in the areas like when reading is
paused or resumed.

The newer approach also allowed us to get rid of some old kludges.
2025-03-24 09:49:38 +02:00
Aram Sargsyan
ac15d3dede Implement -T cookiealwaysvalid
When -T cookiealwaysvalid is passed to named, DNS cookie checks for
the incoming queries always pass, given they are structurally correct.

(cherry picked from commit 807ef8545d)
2025-03-17 12:01:42 +00:00
Mark Andrews
89e76eec70 Add missing locks when returning addresses
Add missing locks in dns_zone_getxfrsource4 et al. Addresses CID
468706, 468708, 468741, 468742, 468785 and 468778.

Cleanup dns_zone_setxfrsource4 et al to now return void.

Remove double copies with dns_zone_getprimaryaddr and dns_zone_getsourceaddr.

(cherry picked from commit d0a59277fb)
2025-03-16 15:26:12 +11:00
Mark Andrews
54c89f75f3 Implement digest_sig and digest_rrsig for ZONEMD
ZONEMD needs to be able to digest SIG and RRSIG records.  The signer
field can be compressed in SIG so we need to call dns_name_digest().
While for RRSIG the records the signer field is not compressed the
canonical form has the signer field downcased (RFC 4034, 6.2).  This
also implies that compare_rrsig needs to downcase the signer field
during comparison.

(cherry picked from commit 006c5990ce)
2025-03-05 10:34:52 +00:00
Aram Sargsyan
df373d7d99 Fix memory ordering for operations with quota->used and quota->waiting
Change all the non-locked operations on 'quota->used' and
'quota->waiting' to "acq/rel" for inter-thread synchronization. Some
loads are left as "relaxed", because they are under a locked mutex
which also provides protection.
2025-03-04 09:57:34 +00:00
Aram Sargsyan
80d7d11f37 Use relaxed memory ordering for quota->max and quota->soft
These variables are not critical for memory ordering issues
and we can use the relaxed memory ordering, as done in the
main branch.
2025-03-04 09:57:34 +00:00
Artem Boldariev
94bcd8c253 DoH: Bump the active streams processing limit
This commit bumps the total number of active streams (= the opened
streams for which a request is received, but response is not ready) to
60% of the total streams limit.

The previous limit turned out to be too tight as revealed by
longer (≥1h) runs of "stress:long:rpz:doh+udp:linux:*" tests.

(cherry picked from commit eaad0aefe6)
2025-03-03 12:08:15 +02:00
Artem Boldariev
aa6fd85b0b DoH: remove obsolete INSIST() check
The check, while not active by default, is not valid since the commit
8b8f4d500d.

See 'if (total == 0) { ...' below branch to understand why.

(cherry picked from commit 217a1ebd79)
2025-03-03 12:07:48 +02:00
Artem Boldariev
d9928ccb62 DoH: Flush HTTP write buffer on an outgoing DNS message
Previously, the code would try to avoid sending any data regardless of
what it is unless:

a) The flush limit is reached;
b) There are no sends in flight.

This strategy is used to avoid too numerous send requests with little
amount of data. However, it has been proven to be too aggressive and,
in fact, harms performance in some cases (e.g., on longer (≥1h) runs
of "stress:long:rpz:doh+udp:linux:*").

Now, additionally to the listed cases, we also:

c) Flush the buffer and perform a send operation when there is an
outgoing DNS message passed to the code (which is indicated by the
presence of a send callback).

That helps improve performance for "stress:long:rpz:doh+udp:linux:*"
tests.

(cherry picked from commit c5f7968856)
2025-03-03 12:07:15 +02:00
Artem Boldariev
b4e8089694 DoH: Limit the number of delayed IO processing requests
Previously, a function for continuing IO processing on the next UV
tick was introduced (http_do_bio_async()). The intention behind this
function was to ensure that http_do_bio() is eventually called at
least once in the future. However, the current implementation allows
queueing multiple such delayed requests needlessly. There is currently
no need for these excessive requests as http_do_bio() can requeue them
if needed. At the same time, each such request can lead to a memory
allocation, particularly in BIND 9.18.

This commit ensures that the number of enqueued delayed IO processing
requests never exceeds one in order to avoid potentially bombarding IO
threads with the delayed requests needlessly.

(cherry picked from commit 0e1b02868a)
2025-03-03 12:06:44 +02:00
Artem Boldariev
e525029b89 DoH: Simplify http_do_bio()
This commit significantly simplifies the code flow in the
http_do_bio() function, which is responsible for processing incoming
and outgoing HTTP/2 data. It seems that the way it was structured
before was indirectly caused by the presence of the missing callback
calls bug, fixed in 8b8f4d500d.

The change introduced by this commit is known to remove a bottleneck
and allows reproducible and measurable performance improvement for
long runs (>= 1h) of "stress:long:rpz:doh+udp:linux:*" tests.

Additionally, it fixes a similar issue with potentially missing send
callback calls processing and hardens the code against use-after-free
errors related to the session object (they can potentially occur).

(cherry picked from commit 0956fb9b9e)
2025-03-03 12:06:05 +02:00
Evan Hunt
90989bfdfb prevent a reference leak from the ns_query_done hooks
if the NS_QUERY_DONE_BEGIN or NS_QUERY_DONE_SEND hook is
used in a plugin and returns NS_HOOK_RETURN, some of the
cleanup in ns_query_done() can be skipped over, leading
to reference leaks that can cause named to hang on shut
down.

this has been addressed by adding more housekeeping
code after the cleanup: tag in ns_query_done().

(cherry picked from commit c2e4358267)
2025-02-25 22:41:27 +00:00
Evan Hunt
9bec99ce7d Fix a logic error in cache_name()
A change in 6aba56ae8 (checking whether a rejected RRset was identical
to the data it would have replaced, so that we could still cache a
signature) inadvertently introduced cases where processing of a
response would continue when previously it would have been skipped.

(cherry picked from commit d0fd9cbe3b)
2025-02-24 23:42:30 +00:00
Aram Sargsyan
f1ec774f9a Fix RPZ bug when resuming a query during a reconfiguration
After a reconfiguration the old view can be left without a valid
'rpzs' member, because when the RPZ is not changed during the named
reconfiguration 'rpzs' "migrate" from the old view into the new
view, so when a query resumes it can find that 'qctx->view->rpzs'
is NULL which query_resume() currently doesn't expect to happen if
it's recursing and 'qctx->rpz_st' is not NULL.

Fix the issue by adding a NULL-check. In order to not split the log
message to two different log messages depending on whether
'qctx->view->rpzs' is NULL or not, change the message to not log
the RPZ policy's "version" which is just a runtime counter and is
most likely not very useful for the users.

(cherry picked from commit 3ea2fbc238)
2025-02-21 11:45:52 +00:00
Mark Andrews
2c42324e26 Remove check for missing RRSIG records from getsection
Checking whether the authority section is properly signed should
be left to the validator.  Checking in getsection (dns_message_parse)
was way too early and resulted in resolution failures of lookups
that should have otherwise succeeded.

(cherry picked from commit 83159d0a54)
2025-02-21 14:20:54 +11:00
Evan Hunt
b2e11b1ad3 Check whether a rejected rrset is different
Add a new dns_rdataset_equals() function to check whether two
rdatasets are equal in DNSSEC terms.

When an rdataset being cached is rejected because its trust
level is lower than the existing rdataset, we now check to see
whether the rejected data was identical to the existing data.
This allows us to cache a potentially useful RRSIG when handling
CD=1 queries, while still rejecting RRSIGs that would definitely
have resulted in a validation failure.

(cherry picked from commit 6aba56ae89)
2025-02-19 18:55:01 -08:00
Artem Boldariev
66bdddc51a DoH: http_send_outgoing() return value is not used
The value returned by http_send_outgoing() is not used anywhere, so we
make it not return anything (void). Probably it is an omission from
older times.

(cherry picked from commit 2adabe835a)
2025-02-19 19:42:15 +02:00
Artem Boldariev
0b9e8e6063 DoH: Fix missing send callback calls
When handling outgoing data, there were a couple of rarely executed
code paths that would not take into account that the callback MUST be
called.

It could lead to potential memory leaks and consequent shutdown hangs.

(cherry picked from commit 8b8f4d500d)
2025-02-19 19:42:15 +02:00
Artem Boldariev
f9aa7a298d DoH: change how the active streams number is calculated
This commit changes the way how the number of active HTTP streams is
calculated and allows it to scale with the values of the maximum
amount of streams per connection, instead of effectively capping at
STREAM_CLIENTS_PER_CONN.

The original limit, which is intended to define the pipelining limit
for TCP/DoT. However, it appeared to be too restrictive for DoH, as it
works quite differently and implements pipelining at protocol level by
the means of multiplexing multiple streams. That renders each stream
to be effectively a separate connection from the point of view of the
rest of the codebase.

(cherry picked from commit a22bc2d7d4)
2025-02-19 19:42:15 +02:00
Artem Boldariev
3c49824589 DoH: Track the amount of in flight outgoing data
Previously we would limit the amount of incoming data to process based
solely on the presence of not completed send requests. That worked,
however, it was found to severely degrade performance in certain
cases, as was revealed during extended testing.

Now we switch to keeping track of how much data is in flight (or ready
to be in flight) and limit the amount of processed incoming data when
the amount of in flight data surpasses the given threshold, similarly
to like we do in other transports.

(cherry picked from commit 05e8a50818)
2025-02-19 19:42:15 +02:00
Mark Andrews
86e65f317a Re-fetch pending records that failed validation
If a deferred validation on data that was originally queried with
CD=1 fails, we now repeat the query, since the zone data may have
changed in the meantime.

(cherry picked from commit 04b1484ed8)
2025-02-18 23:59:10 +00:00
Mark Andrews
48b32e64c4 Complete the deferred validation if there are no RRSIGs
When a query is made with CD=1, we store the result in the
cache marked pending so that it can be validated later, at
which time it will either be accepted as an answer or removed
from the cache as invalid.  Deferred validation was not
attempted when there were no cached RRSIGs for DNSKEY and
DS.  We now complete the deferred validation in this scenario.

(cherry picked from commit 8b900d1808)
2025-02-18 23:59:10 +00:00
Aram Sargsyan
fc24cfd71d Fix a race issue in dns_view_addzone()
Views use two types of reference counting - regular and weak, and
when there are no more regular references, the view_flushanddetach()
function destroys or detaches some parts of the view, including
'view->zonetable', while other parts are freed by destroy() when
the last weak reference is detached. Since catalog zones use weak
references to attach a view, it's currently possible that during
shutdown catalog zone processing will try to add a new zone into
an otherwise unused view (because it's shutting down) which doesn't
have an attached zonetable any more. This could cause an assertion
failure. Fix this issue by modifying the dns_view_addzone() function
to expect that 'view->zonetable' can be NULL, and in that case just
return ISC_R_SHUTTINGDOWN.
2025-02-17 17:21:38 +00:00
Mark Andrews
7111f5e4c2 Fix "CNAME and other data" detection
prio_type was being used in the wrong place to optimize cname_and_other.
We have to first exclude and accepted types and we also have to
determine that the record exists before we can check if we are at
a point where a later CNAME cannot appear.

(cherry picked from commit 5e49a9e4ae)
2025-02-14 13:44:47 +11:00
Ondřej Surý
c9288ea3d7
Print the expiration time of the stale records (not ancient)
In #1870, the expiration time of ANCIENT records were printed, but
actually the ancient records are very short lived, and the information
carries a little value.

Instead of printing the expiration of ANCIENT records, print the
expiration time of STALE records.
2025-02-04 18:07:30 +01:00
Ondřej Surý
0c064cfde4
Expand the usage of set_ttl() before mark_header_ancient()
When the mark_header_ancient() helper function was introduced, couple of
places with duplicate (or almost duplicate) code was missed.  Add
missing set_ttl() calls before mark_header_ancient(), so the handling of
expiring headers is same in all places.

(concept cherry picked from commit 58179e6a19)
2025-02-03 15:12:59 +01:00
Ondřej Surý
63e8af9270
Add better ZEROTTL handling in bindrdataset()
If we know that the header has ZEROTTL set, the server should never send
stale records for it and the TTL should never be anything else than 0.
The comment was already there, but the code was not matching the
comment.

(cherry picked from commit cfee6aa565)
2025-02-03 15:04:36 +01:00
Ondřej Surý
9a8483bece
In cache, set rdataset TTL to 0 when the header is not active
When the header has been marked as ANCIENT, but the ttl hasn't been
reset (this happens in couple of places), the rdataset TTL would be
set to the header timestamp instead to a reasonable TTL value.

Since this header has been already expired (ANCIENT is set), set the
rdataset TTL to 0 and don't reuse this field to print the expiration
time when dumping the cache.  Instead of printing the time, we now
just print 'expired (awaiting cleanup'.

(cherry picked from commit 1bbb57f81b)
2025-02-03 15:04:36 +01:00
Evan Hunt
291d0d8d90 fix the cache findzonecut implementation
the search for the deepest known zone cut in the cache could
improperly reject a node containing stale data, even if the
NS rdataset wasn't the data that was stale.

this change also improves the efficiency of the search by
stopping it when both NS and RRSIG(NS) have been found.

(cherry picked from commit 1f095b902c)
2025-02-02 13:22:32 -08:00
Ondřej Surý
2974bbba80
Clarify reference counting in RBTDB database
Change the names of the node reference counting functions
and add comments to make the mechanism easier to understand:

- new_reference() and decrement_references() are now called
  dns__rbtnode_acquire() and dns__rbtnode_release()
  respectively; this reflects the fact that they modify both
  the internal and external reference counters for a node.

- rbtnode_newref() and rbtnode_decref are now called
  rbtnode_erefs_increment() and rbtnode_erefs_decrement(),
  to reflect that they only increase and decrease the node's
  external reference counters, not internal.

(cherry picked from commit 857225aeb6)
2025-01-31 06:15:13 +01:00
Ondřej Surý
8465e4516f
Refactor node reference counting in rbtdb.c
Refactor the pattern in the newref() and decref() functions in rbtdb.c
following the pattern, so it follows the similar pattern we already have
for QPDB.

(cherry picked from commit 9c45de9473)
2025-01-31 06:01:29 +01:00
Michał Kępień
64367010f2
Fix "rndc flushname" for longer name server names
dns_adb_flushname() calls dns_name_hash() to determine the ADB bucket
number to search for the given name.  Meanwhile, all other functions in
lib/dns/adb.c call dns_name_fullhash() for determining the bucket number
instead.  This discrepancy causes dns_adb_flushname() to have virtually
no chances of actually removing the given name from the ADB if the
name is longer than 16 bytes (since dns_name_hash() only hashes the
first 16 bytes of the name provided to it) - more specifically, the
probability of success for names longer than 16 bytes is inversely
proportional to the number of ADB buckets in use, i.e. 1:1021 at best.

Fix by using dns_name_fullhash() instead of dns_name_hash() in
dns_adb_flushname(), so that the logic for determining the bucket number
that a given name belongs to is consistent throughout lib/dns/adb.c.
2025-01-30 07:44:18 +01:00
Andoni Duarte Pintado
73997c8161 Merge tag 'v9.18.33' into bind-9.18 2025-01-29 17:23:11 +01:00
Ondřej Surý
b14df7d459
Stop the timer when shuttingdown the fetch context
When canceling the last fetch, we also need to stop the fctx_expired
timer from possibly firing between the fctx_shutdown() call and the
fetch being actually destroyed along with the timer.  As there are
multiple places where fctx_shutdown() is being called without stopping
the timer, move the fctx_stoptimer() to fctx_shutdown() and cleanup the
explicit usage.
2025-01-23 17:46:37 +01:00
Mark Andrews
8790d5cd22 Terminate yaml string after negative comment
(cherry picked from commit 89afc11389)
2025-01-22 23:58:54 +00:00
Ondřej Surý
239f4104da
Remove memory limit on ADB finds and fetches
Address Database (ADB) shares the memory for the short lived ADB
objects (finds, fetches, addrinfo) and the long lived ADB
objects (names, entries, namehooks).  This could lead to a situation
where the resolver-heavy load would force evict ADB objects from the
database to point where ADB is completely empty, leading to even more
resolver-heavy load.

Make the short lived ADB objects use the other memory context that we
already created for the hashmaps.  This makes the ADB overmem condition
to not be triggered by the ongoing resolver fetches.

(cherry picked from commit 05faff6d53)
2025-01-22 15:29:27 +01:00
Ondřej Surý
4cc1160e4d
Replace linked lists with the hashtables to hold fetch contexts
When the recursive-clients value is too large, the linked lists holding
the fetch contexts can also grow large and since the algorithm to merge
outgoing queries is quadratic, named can get slow.

Replace the linked list with hashtable for faster lookups.  This also
allows us to reduce the number of tasks (buckets) in the resolver.
2025-01-22 15:06:04 +01:00
JINMEI Tatuya
065ffb2eb8
Optimize database decref by avoiding locking with refs > 1
Previously, this function always acquires a node write lock if it
might need node cleanup in case the reference decrements to 0.  In
fact, the lock is unnecessary if the reference is larger than 1 and it
can be optimized as an "easy" case. This optimization could even be
"necessary". In some extreme cases, many worker threads could repeat
acquring and releasing the reference on the same node, resulting in
severe lock contention for nothing (as the ref wouldn't decrement to 0
in most cases). This change would prevent noticeable performance
drop like query timeout for such cases.

Co-authored-by: JINMEI Tatuya <jtatuya@infoblox.com>
Co-authored-by: Ondřej Surý <ondrej@isc.org>

(cherry picked from commit 7f4471594d)
2025-01-22 14:31:09 +01:00
Ondřej Surý
8bf311c769
Shutdown the fetch context after canceling the last fetch
Currently, the fetch context will continue running even when the last
fetch (response) has been removed from the context, so named can process
and cache the answer.  This can lead to a situation where the number of
outgoing recursing clients exceeds the the configured number for
recursive-clients.

Be more stringent about the recursive-clients limit and shutdown the
fetch context immediately after the last fetch has been canceled from
that particular fetch context.
2025-01-22 14:21:51 +01:00
Ondřej Surý
1b9d949534
Remove --with-tuning=small/large configuration option
The last remaining tuning value was RESOLVER_NTASKS and instead of
having variable number of the tasks per-cpu and in named and in
dns_client, set the number of the resolver tasks to 523 (number taken
from dns_client unit) to accomodate most of the recursive-clients
values.
2025-01-22 14:16:40 +01:00
Ondřej Surý
d8206a939c
Reduce struct isc__nm_uvreq size from 1560 to 560 bytes
The uv_req union member of struct isc__nm_uvreq contained libuv request
types that we don't use.  Turns out that uv_getnameinfo_t is 1000 bytes
big and unnecessarily enlarged the whole structure.  Remove all the
unused members from the uv_req union.
2025-01-22 14:12:38 +01:00
Ondřej Surý
a7630c2c62
Reduce sizeof isc_sockaddr from 152 to 48 bytes
After removing sockaddr_unix from isc_sockaddr, we can also remove
sockaddr_storage and reduce the isc_sockaddr size from 152 bytes to just
48 bytes needed to hold IPv6 addresses.

(cherry picked from commit 2367b6a2e1)
2025-01-22 14:12:38 +01:00
Artem Boldariev
550b692343 DoH: reduce excessive bad request logging
We started using isc_nm_bad_request() more actively throughout
codebase. In the case of HTTP/2 it can lead to a large count of
useless "Bad Request" messages in the BIND log, as often we attempt to
send such request over effectively finished HTTP/2 sessions.

This commit fixes that.

(cherry picked from commit 937b5f8349)
2025-01-15 16:50:13 +01:00
Artem Boldariev
796708775d DoH: introduce manual read timer control
This commit introduces manual read timer control as used by StreamDNS
and its underlying transports. Before that, DoH code would rely on the
timer control provided by TCP, which would reset the timer any time
some data arrived. Now, the timer is restarted only when a full DNS
message is processed in line with other DNS transports.

That change is required because we should not stop the timer when
reading from the network is paused due to throttling. We need a way to
drop timed-out clients, particularly those who refuse to read the data
we send.

(cherry picked from commit 609a41517b)
2025-01-15 16:49:32 +01:00
Artem Boldariev
ee42514be2 DoH: floodding clients detection
This commit adds logic to make code better protected against clients
that send valid HTTP/2 data that is useless from a DNS server
perspective.

Firstly, it adds logic that protects against clients who send too
little useful (=DNS) data. We achieve that by adding a check that
eventually detects such clients with a nonfavorable useful to
processed data ratio after the initial grace period. The grace period
is limited to processing 128 KiB of data, which should be enough for
sending the largest possible DNS message in a GET request and then
some. This is the main safety belt that would detect even flooding
clients that initially behave well in order to fool the checks server.

Secondly, in addition to the above, we introduce additional checks to
detect outright misbehaving clients earlier:

The code will treat clients that open too many streams (50) without
sending any data for processing as flooding ones; The clients that
managed to send 1.5 KiB of data without opening a single stream or
submitting at least some DNS data will be treated as flooding ones.
Of course, the behaviour described above is nothing else but
heuristical checks, so they can never be perfect. At the same time,
they should be reasonable enough not to drop any valid clients,
realatively easy to implement, and have negligible computational
overhead.

(cherry picked from commit 3425e4b1d0)
2025-01-15 16:49:23 +01:00
Artem Boldariev
11a2956dce DoH: process data chunk by chunk instead of all at once
Initially, our DNS-over-HTTP(S) implementation would try to process as
much incoming data from the network as possible. However, that might
be undesirable as we might create too many streams (each effectively
backed by a ns_client_t object). That is too forgiving as it might
overwhelm the server and trash its memory allocator, causing high CPU
and memory usage.

Instead of doing that, we resort to processing incoming data using a
chunk-by-chunk processing strategy. That is, we split data into small
chunks (currently 256 bytes) and process each of them
asynchronously. However, we can process more than one chunk at
once (up to 4 currently), given that the number of HTTP/2 streams has
not increased while processing a chunk.

That alone is not enough, though. In addition to the above, we should
limit the number of active streams: these streams for which we have
received a request and started processing it (the ones for which a
read callback was called), as it is perfectly fine to have more opened
streams than active ones. In the case we have reached or surpassed the
limit of active streams, we stop reading AND processing the data from
the remote peer. The number of active streams is effectively decreased
only when responses associated with the active streams are sent to the
remote peer.

Overall, this strategy is very similar to the one used for other
stream-based DNS transports like TCP and TLS.

(cherry picked from commit 9846f395ad)
2025-01-15 16:47:21 +01:00
Artem Boldariev
125bfd71d3 Add isc__nm_async_run()
This commit adds isc__nm_async_run() which is very similar to
isc_async_run() in newer versions of BIND: it allows calling a
callback asynchronously.

Potentially, it can be used to replace some other async operations in
other networking code, in particular the delayed I/O calls in TLS a
TCP DNS transports to name a few and remove quiet a lot of code, but
it we are unlikely to do that for the strictly maintenance only
branch, so it is protected with DoH-related #ifdefs.

It is implemented in a "universal" way mainly because doing it in the
specific code requires the same amount of code and is not simpler.
2025-01-15 16:43:47 +01:00
Artem Boldariev
13d521fa5f Implement TLS manual read timer control functionality
This commit adds a manual TLS read timer control mode which is
supposed to override automatic resetting of the timer when any data is
received.

It both depends and complements similar functionality in TCP.
2025-01-15 15:34:43 +00:00
Artem Boldariev
a67b325542 Implement TCP manual read timer control functionality
This commit adds a manual TCP read timer control mode which is
supposed to override automatic resetting of the timer when any data is
received. That can be accomplished by
`isc__nmhandle_set_manual_timer()`.

This functionality is supposed to be used by multilevel networking
transports which require finer grained control over the read
timer (TLS Stream, DoH).

The commit is essentially an implementation of the functionality from
newer versions of BIND.
2025-01-15 15:34:43 +00:00
Ondřej Surý
fa7b7973e3 Limit the additional processing for large RDATA sets
When answering queries, don't add data to the additional section if
the answer has more than 13 names in the RDATA.  This limits the
number of lookups into the database(s) during a single client query,
reducing query processing load.

Also, don't append any additional data to type=ANY queries. The
answer to ANY is already big enough.

(cherry picked from commit a1982cf1bb)
2025-01-15 14:13:45 +01:00