Commit graph

39666 commits

Author SHA1 Message Date
Michal Nowak
48a77b943a
Fix broken links in documentation
Some detected links are not to be verified (127.*, dnssec-or-not.com)
and some I can't fix (flaticon, godaddy, icann), but they are not
crucial.

(cherry picked from commit 8302469507)
2025-01-24 14:38:52 +01:00
Matthijs Mekking
13f3e88a8e [9.18] chg: doc: Document how secondaries refresh a zone in the ARM
Closes #5123

Backport of MR !9966

Merge branch 'backport-5123-document-refreshing-a-secondary-9.18' into 'bind-9.18'

See merge request isc-projects/bind9!9987
2025-01-24 09:07:30 +00:00
Matthijs Mekking
8777a33e3e Document how secondaries refresh a zone in the ARM
We have a KB article that describes this, put a condensed version into
the ARM.

(cherry picked from commit 8daf3782d1)
2025-01-24 09:07:21 +00:00
Nicki Křížek
46be0e9838 [9.18] chg: ci: Set stricter limits for respdiff testing
Adjust the limit of maximum disagreements in respdiff results based on
recent pipeline results.

The respdiff and respdiff:asan seem to have almost identical results,
typically around 0.07 % of differences with ocassional spikes up to
around 0.11 %. Similar results are for respdiff:tsan, perhaps with more
common spikes with values up to around 0.12 %. Set the limit to 0.15 %
to allow for some tolerance due to network conditions, time of day etc.

The respdiff:third-party has a slightly higher disagreements average,
with typical values being around 0.12 %. Set the limit to 0.2 %.

Exceeding either of those values should be quite clear indication that
some resolution behaviour has changed, since the values appear to be
very stable within the newly configured limits.

Backport of MR !9950

Merge branch 'backport-nicki/ci-respdiff-limits-9.18' into 'bind-9.18'

See merge request isc-projects/bind9!9990
2025-01-23 17:48:18 +00:00
Nicki Křížek
1c1a7aa2e1 Set stricter limits for respdiff testing
Adjust the limit of maximum disagreements in respdiff results based on
recent pipeline results.

The respdiff and respdiff:asan seem to have almost identical results,
typically around 0.07 % of differences with ocassional spikes up to
around 0.11 %. Similar results are for respdiff:tsan, perhaps with more
common spikes with values up to around 0.12 %. Set the limit to 0.15 %
to allow for some tolerance due to network conditions, time of day etc.

The respdiff:third-party has a slightly higher disagreements average,
with typical values being around 0.12 %. Set the limit to 0.2 %.

Exceeding either of those values should be quite clear indication that
some resolution behaviour has changed, since the values appear to be
very stable within the newly configured limits.

(cherry picked from commit 0584d3f65f)
2025-01-23 18:31:38 +01:00
Ondřej Surý
2c8d9c490c fix: nil: Stop the timer when canceling the last fetch
When canceling the last fetch, we also need to stop the fctx_expired
timer from possibly firing between the fctx_shutdown() call and the
fetch being actually destroyed along with the timer.

Closes #5136

Merge branch '5136-stop-timer-when-canceling-last-fetch-9.18' into 'bind-9.18'

See merge request isc-projects/bind9!9988
2025-01-23 17:25:37 +00:00
Ondřej Surý
b14df7d459
Stop the timer when shuttingdown the fetch context
When canceling the last fetch, we also need to stop the fctx_expired
timer from possibly firing between the fctx_shutdown() call and the
fetch being actually destroyed along with the timer.  As there are
multiple places where fctx_shutdown() is being called without stopping
the timer, move the fctx_stoptimer() to fctx_shutdown() and cleanup the
explicit usage.
2025-01-23 17:46:37 +01:00
Matthijs Mekking
8e631afebe [9.18] fix: doc: Clarify dnssec-signzone interval option
There was confusion about whether the interval was calculated from
the validity period provided on the command line (with -s and -e),
or from the signature being replaced.

Add text to clarify that the interval is calculated from the new
validity period.

Closes #5128

Backport of MR !9955

Merge branch 'backport-5128-clarify-dnssec-signzone-interval-9.18' into 'bind-9.18'

See merge request isc-projects/bind9!9984
2025-01-23 11:53:34 +00:00
Matthijs Mekking
7545157fe8 Clarify dnssec-signzone interval option
There was confusion about whether the interval was calculated from
the validity period provided on the command line (with -s and -e),
or from the signature being replaced.

Add text to clarify that the interval is calculated from the new
validity period.

(cherry picked from commit ae42fa69fa)
2025-01-23 11:13:15 +00:00
Mark Andrews
132947c0ba [9.18] fix: usr: Yaml string not terminated in negative response in delv
Closes #5098

Backport of MR !9922

Merge branch 'backport-5098-missing-yaml-string-termination-delv-9.18' into 'bind-9.18'

See merge request isc-projects/bind9!9980
2025-01-23 00:40:13 +00:00
Mark Andrews
60c441eeff Check delv +yaml negative response output
(cherry picked from commit 9c04640def)
2025-01-22 23:58:54 +00:00
Mark Andrews
8790d5cd22 Terminate yaml string after negative comment
(cherry picked from commit 89afc11389)
2025-01-22 23:58:54 +00:00
Ondřej Surý
7c90bd5bb3 [9.18] fix: usr: Apply the memory limit only to ADB database items
Resolver under heavy-load could exhaust the memory available for storing
the information in the Address Database (ADB) effectively evicting already
stored information in the ADB.  The memory used to retrieve and provide
information from the ADB is now not a subject of the same memory limits
that are applied for storing the information in the Address Database.

Closes #5127

Backport of MR !9954

Merge branch 'backport-5127-change-ADB-memory-split-9.18' into 'bind-9.18'

See merge request isc-projects/bind9!9976
2025-01-22 14:30:05 +00:00
Ondřej Surý
239f4104da
Remove memory limit on ADB finds and fetches
Address Database (ADB) shares the memory for the short lived ADB
objects (finds, fetches, addrinfo) and the long lived ADB
objects (names, entries, namehooks).  This could lead to a situation
where the resolver-heavy load would force evict ADB objects from the
database to point where ADB is completely empty, leading to even more
resolver-heavy load.

Make the short lived ADB objects use the other memory context that we
already created for the hashmaps.  This makes the ADB overmem condition
to not be triggered by the ongoing resolver fetches.

(cherry picked from commit 05faff6d53)
2025-01-22 15:29:27 +01:00
Ondřej Surý
2c667bc9c6 [9.18] fix: usr: Improve the resolver performance under attack
A remote client can force the DNS resolver component to consume the memory faster than cleaning up the resources for the canceled resolver fetches due to `recursive-clients` limit. If the such traffic pattern is sustained for a long period of time, the DNS server might eventually run out of the available memory. This has been fixed.

It should be noted that when under such heavy attack for BIND 9 version both with and without the fix, no outgoing DNS queries will be successful as the generated traffic pattern will consume all the available slots for the recursive clients.

Merge branch '5110-backport-the-hashtable-use-for-fetchcontexts-9.18' into 'bind-9.18'

See merge request isc-projects/bind9!9961
2025-01-22 14:27:44 +00:00
Ondřej Surý
4cc1160e4d
Replace linked lists with the hashtables to hold fetch contexts
When the recursive-clients value is too large, the linked lists holding
the fetch contexts can also grow large and since the algorithm to merge
outgoing queries is quadratic, named can get slow.

Replace the linked list with hashtable for faster lookups.  This also
allows us to reduce the number of tasks (buckets) in the resolver.
2025-01-22 15:06:04 +01:00
Ondřej Surý
43c77d95f1 [9.18] fix: usr: Avoid unnecessary locking in the zone/cache database
Prevent lock contention among many worker threads referring to the same database node at the same time. This would improve zone and cache database performance for the heavily contended database nodes.

Backport of !9963 

Closes #5130

Merge branch '5130-reduce-lock-contention-in-decrement-reference-9.18' into 'bind-9.18'

See merge request isc-projects/bind9!9965
2025-01-22 13:31:39 +00:00
JINMEI Tatuya
065ffb2eb8
Optimize database decref by avoiding locking with refs > 1
Previously, this function always acquires a node write lock if it
might need node cleanup in case the reference decrements to 0.  In
fact, the lock is unnecessary if the reference is larger than 1 and it
can be optimized as an "easy" case. This optimization could even be
"necessary". In some extreme cases, many worker threads could repeat
acquring and releasing the reference on the same node, resulting in
severe lock contention for nothing (as the ref wouldn't decrement to 0
in most cases). This change would prevent noticeable performance
drop like query timeout for such cases.

Co-authored-by: JINMEI Tatuya <jtatuya@infoblox.com>
Co-authored-by: Ondřej Surý <ondrej@isc.org>

(cherry picked from commit 7f4471594d)
2025-01-22 14:31:09 +01:00
Ondřej Surý
57187b2c4f [9.18] chg: dev: Shutdown the fetch context after canceling the last fetch
Shutdown the fetch context immediately after the last fetch has been canceled from that particular fetch context.

Merge branch 'ondrej/shutdown-the-fetch-context-early-9.18' into 'bind-9.18'

See merge request isc-projects/bind9!9960
2025-01-22 13:22:26 +00:00
Ondřej Surý
8bf311c769
Shutdown the fetch context after canceling the last fetch
Currently, the fetch context will continue running even when the last
fetch (response) has been removed from the context, so named can process
and cache the answer.  This can lead to a situation where the number of
outgoing recursing clients exceeds the the configured number for
recursive-clients.

Be more stringent about the recursive-clients limit and shutdown the
fetch context immediately after the last fetch has been canceled from
that particular fetch context.
2025-01-22 14:21:51 +01:00
Ondřej Surý
327b666c6d [9.18] rem: usr: Remove --with-tuning=small/large configuration option
The configuration option --with-tuning has been removed as it is no longer required or desired.

Merge branch 'ondrej/remove-tuning-large-9.18' into 'bind-9.18'

See merge request isc-projects/bind9!9959
2025-01-22 13:17:13 +00:00
Ondřej Surý
1b9d949534
Remove --with-tuning=small/large configuration option
The last remaining tuning value was RESOLVER_NTASKS and instead of
having variable number of the tasks per-cpu and in named and in
dns_client, set the number of the resolver tasks to 523 (number taken
from dns_client unit) to accomodate most of the recursive-clients
values.
2025-01-22 14:16:40 +01:00
Ondřej Surý
008e520109 [9.18] chg: dev: Reduce memory sizes of common structures
* Reduce `sizeof(isc_sockaddr_t)` from 152 to 48 bytes
* Reduce `sizeof(struct isc__nm_uvreq)` from 1560 to 560 bytes

Partial backport of !8299

Merge branch 'ondrej/reduce-netmgr-memory-usage-9.18' into 'bind-9.18'

See merge request isc-projects/bind9!9953
2025-01-22 13:13:01 +00:00
Ondřej Surý
d8206a939c
Reduce struct isc__nm_uvreq size from 1560 to 560 bytes
The uv_req union member of struct isc__nm_uvreq contained libuv request
types that we don't use.  Turns out that uv_getnameinfo_t is 1000 bytes
big and unnecessarily enlarged the whole structure.  Remove all the
unused members from the uv_req union.
2025-01-22 14:12:38 +01:00
Ondřej Surý
a7630c2c62
Reduce sizeof isc_sockaddr from 152 to 48 bytes
After removing sockaddr_unix from isc_sockaddr, we can also remove
sockaddr_storage and reduce the isc_sockaddr size from 152 bytes to just
48 bytes needed to hold IPv6 addresses.

(cherry picked from commit 2367b6a2e1)
2025-01-22 14:12:38 +01:00
Colin Vidal
e487294ce4 [9.18] new: nil: ignore TAGS files
Backport of MR !9956

Merge branch 'backport-colin/ignoreTAGS-9.18' into 'bind-9.18'

See merge request isc-projects/bind9!9974
2025-01-22 12:09:02 +00:00
Colin Vidal
9c5d1ebe28 ignore TAGS files
TAGS file are generated from `make tags` using etags. Other index tags
are already ignored (GTAGS, GPATH, etc.). Also ignoring `TAGS`.

(cherry picked from commit 2164ea8abd)
2025-01-22 11:23:33 +00:00
Andoni Duarte
766b7bcf7e chg: doc: Set up version for BIND 9.18.34
Merge branch 'andoni/set-up-version-for-bind-9.18.34' into 'bind-9.18'

See merge request isc-projects/bind9!9970
2025-01-22 08:33:26 +00:00
Andoni Duarte Pintado
10680d143c Update BIND version to 9.18.34-dev 2025-01-21 17:55:04 +01:00
Nicki Křížek
6fc161b582 Update BIND version for release 2025-01-20 14:35:25 +01:00
Nicki Křížek
12805f9184 new: doc: Prepare documentation for BIND 9.18.33
Merge branch 'andoni/prepare-documentation-for-bind-9.18.33' into 'v9.18.33-release'

See merge request isc-private/bind9!774
2025-01-20 13:32:58 +00:00
Andoni Duarte Pintado
bee47c986f
Tweak and reword release notes 2025-01-18 06:07:18 +01:00
Andoni Duarte Pintado
bcff826fba
Fix broken option reference in the ARM 2025-01-18 06:07:18 +01:00
Andoni Duarte Pintado
152684faf7 Prepare release notes for BIND 9.18.33 2025-01-16 16:39:21 +01:00
Andoni Duarte Pintado
d48290afe5 Generate changelog for BIND 9.18.33 2025-01-16 16:38:10 +01:00
Andoni Duarte
e733e62414 [9.18] [CVE-2024-12705] sec: usr: DNS-over-HTTP(s) flooding fixes
Fix DNS-over-HTTP(S) implementation issues that arise under heavy
query load. Optimize resource usage for :iscman:`named` instances
that accept queries over DNS-over-HTTP(S).

Previously, :iscman:`named` would process all incoming HTTP/2 data
at once, which could overwhelm the server, especially when dealing
with clients that send requests but don't wait for responses. That
has been fixed. Now, :iscman:`named` handles HTTP/2 data in smaller
chunks and throttles reading until the remote side reads the
response data. It also throttles clients that send too many requests
at once.

Additionally, :iscman:`named` now carefully processes data sent by
some clients, which can be considered "flooding." It logs these
clients and drops connections from them.
:gl:`#4795`

In some cases, :iscman:`named` could leave DNS-over-HTTP(S)
connections in the `CLOSE_WAIT` state indefinitely. That also has
been fixed. ISC would like to thank JF Billaud for thoroughly
investigating the issue and verifying the fix.
:gl:`#5083`

See https://gitlab.isc.org/isc-projects/bind9/-/issues/4795

Closes https://gitlab.isc.org/isc-projects/bind9/-/issues/5083

Backport of !732.

Merge branch 'artem-improve-doh-resource-usage-9.18' into 'v9.18.33-release'

See merge request isc-private/bind9!763
2025-01-15 16:03:28 +00:00
Artem Boldariev
550b692343 DoH: reduce excessive bad request logging
We started using isc_nm_bad_request() more actively throughout
codebase. In the case of HTTP/2 it can lead to a large count of
useless "Bad Request" messages in the BIND log, as often we attempt to
send such request over effectively finished HTTP/2 sessions.

This commit fixes that.

(cherry picked from commit 937b5f8349)
2025-01-15 16:50:13 +01:00
Artem Boldariev
796708775d DoH: introduce manual read timer control
This commit introduces manual read timer control as used by StreamDNS
and its underlying transports. Before that, DoH code would rely on the
timer control provided by TCP, which would reset the timer any time
some data arrived. Now, the timer is restarted only when a full DNS
message is processed in line with other DNS transports.

That change is required because we should not stop the timer when
reading from the network is paused due to throttling. We need a way to
drop timed-out clients, particularly those who refuse to read the data
we send.

(cherry picked from commit 609a41517b)
2025-01-15 16:49:32 +01:00
Artem Boldariev
ee42514be2 DoH: floodding clients detection
This commit adds logic to make code better protected against clients
that send valid HTTP/2 data that is useless from a DNS server
perspective.

Firstly, it adds logic that protects against clients who send too
little useful (=DNS) data. We achieve that by adding a check that
eventually detects such clients with a nonfavorable useful to
processed data ratio after the initial grace period. The grace period
is limited to processing 128 KiB of data, which should be enough for
sending the largest possible DNS message in a GET request and then
some. This is the main safety belt that would detect even flooding
clients that initially behave well in order to fool the checks server.

Secondly, in addition to the above, we introduce additional checks to
detect outright misbehaving clients earlier:

The code will treat clients that open too many streams (50) without
sending any data for processing as flooding ones; The clients that
managed to send 1.5 KiB of data without opening a single stream or
submitting at least some DNS data will be treated as flooding ones.
Of course, the behaviour described above is nothing else but
heuristical checks, so they can never be perfect. At the same time,
they should be reasonable enough not to drop any valid clients,
realatively easy to implement, and have negligible computational
overhead.

(cherry picked from commit 3425e4b1d0)
2025-01-15 16:49:23 +01:00
Artem Boldariev
11a2956dce DoH: process data chunk by chunk instead of all at once
Initially, our DNS-over-HTTP(S) implementation would try to process as
much incoming data from the network as possible. However, that might
be undesirable as we might create too many streams (each effectively
backed by a ns_client_t object). That is too forgiving as it might
overwhelm the server and trash its memory allocator, causing high CPU
and memory usage.

Instead of doing that, we resort to processing incoming data using a
chunk-by-chunk processing strategy. That is, we split data into small
chunks (currently 256 bytes) and process each of them
asynchronously. However, we can process more than one chunk at
once (up to 4 currently), given that the number of HTTP/2 streams has
not increased while processing a chunk.

That alone is not enough, though. In addition to the above, we should
limit the number of active streams: these streams for which we have
received a request and started processing it (the ones for which a
read callback was called), as it is perfectly fine to have more opened
streams than active ones. In the case we have reached or surpassed the
limit of active streams, we stop reading AND processing the data from
the remote peer. The number of active streams is effectively decreased
only when responses associated with the active streams are sent to the
remote peer.

Overall, this strategy is very similar to the one used for other
stream-based DNS transports like TCP and TLS.

(cherry picked from commit 9846f395ad)
2025-01-15 16:47:21 +01:00
Artem Boldariev
125bfd71d3 Add isc__nm_async_run()
This commit adds isc__nm_async_run() which is very similar to
isc_async_run() in newer versions of BIND: it allows calling a
callback asynchronously.

Potentially, it can be used to replace some other async operations in
other networking code, in particular the delayed I/O calls in TLS a
TCP DNS transports to name a few and remove quiet a lot of code, but
it we are unlikely to do that for the strictly maintenance only
branch, so it is protected with DoH-related #ifdefs.

It is implemented in a "universal" way mainly because doing it in the
specific code requires the same amount of code and is not simpler.
2025-01-15 16:43:47 +01:00
Artem Boldariev
13d521fa5f Implement TLS manual read timer control functionality
This commit adds a manual TLS read timer control mode which is
supposed to override automatic resetting of the timer when any data is
received.

It both depends and complements similar functionality in TCP.
2025-01-15 15:34:43 +00:00
Artem Boldariev
a67b325542 Implement TCP manual read timer control functionality
This commit adds a manual TCP read timer control mode which is
supposed to override automatic resetting of the timer when any data is
received. That can be accomplished by
`isc__nmhandle_set_manual_timer()`.

This functionality is supposed to be used by multilevel networking
transports which require finer grained control over the read
timer (TLS Stream, DoH).

The commit is essentially an implementation of the functionality from
newer versions of BIND.
2025-01-15 15:34:43 +00:00
Andoni Duarte
c6e6a7af8a [9.18] [CVE-2024-11187] sec: usr: Limit the additional processing for large RDATA sets
When answering queries, don't add data to the additional section if the answer has more than 13 names in the RDATA. This limits the number of lookups into the database(s) during a single client query, reducing query processing load.

Backport of MR !750

See isc-projects/bind9#5034

Merge branch '5034-security-limit-additional-9.18' into 'v9.18.33-release'

See merge request isc-private/bind9!759
2025-01-15 13:27:08 +00:00
Ondřej Surý
fa7b7973e3 Limit the additional processing for large RDATA sets
When answering queries, don't add data to the additional section if
the answer has more than 13 names in the RDATA.  This limits the
number of lookups into the database(s) during a single client query,
reducing query processing load.

Also, don't append any additional data to type=ANY queries. The
answer to ANY is already big enough.

(cherry picked from commit a1982cf1bb)
2025-01-15 14:13:45 +01:00
Ondřej Surý
cd48dcb0f8 Isolate using the -T noaa flag only for part of the resolver test
Instead of running the whole resolver/ns4 server with -T noaa flag,
use it only for the part where it is actually needed.  The -T noaa
could interfere with other parts of the test because the answers don't
have the authoritative-answer bit set, and we could have false
positives (or false negatives) in the test because the authoritative
server doesn't follow the DNS protocol for all the tests in the resolver
system test.

(cherry picked from commit e51d4d3b88)
2025-01-15 14:13:17 +01:00
Nicki Křížek
a3fe766fe9 [9.18] new: ci: Add shotgun perf test of DoH GET to CI
Add performance tests of DoH using the GET protocol to nightly pipelines.

Backport of MR !9926

Merge branch 'backport-nicki/ci-shotgun-doh-get-9.18' into 'bind-9.18'

See merge request isc-projects/bind9!9940
2025-01-08 14:13:04 +00:00
Nicki Křížek
934b57040f Add shotgun perf test of DoH GET to CI
(cherry picked from commit 32c5f24713)
2025-01-08 13:46:54 +00:00
Arаm Sаrgsyаn
f68e60b3dc fix: dev: Fix a bug in isc_rwlock_trylock()
When isc_rwlock_trylock() fails to get a read lock because another
writer was faster, it should wake up other waiting writers in case
there are no other readers, but the current code forgets about
the currently active writer when evaluating 'cntflag'.

Unset the WRITER_ACTIVE bit in 'cntflag' before checking to see if
there are other readers, otherwise the waiting writers, if they exist,
might not wake up.

Closes #5121

Merge branch 'aram/isc_rwlock_trylock-bugfix-9.18' into 'bind-9.18'

See merge request isc-projects/bind9!9937
2025-01-08 10:29:14 +00:00
Aram Sargsyan
73b6d9e9e5 Fix a bug in isc_rwlock_trylock()
When isc_rwlock_trylock() fails to get a read lock because another
writer was faster, it should wake up other waiting writers in case
there are no other readers, but the current code forgets about
the currently active writer when evaluating 'cntflag'.

Unset the WRITER_ACTIVE bit in 'cntflag' before checking to see if
there are other readers, otherwise the waiting writers, if they exist,
might not wake up.
2025-01-07 13:30:26 +00:00