The 'I:checking that lifting the limit will allow everything to get
cached (20)' test was failing due to the TTL of the records being
too short for the elapsed time of the test. Raise the TTL to fix
this and adjust other tests as needed.
(cherry picked from commit 1a58bd2113)
From technical reasons --with-readline=libedit is not being tested on
FreeBSD anymore as it's hard to have anchors both unified and specific.
(cherry picked from commit e0df774ca0)
ZONEMD digests RRSIG records and potentially digests SIG record. Add digests
methods for both record types.
Closes#5219
Backport of MR !10217
Merge branch 'backport-5219-add-digest-methods-for-sig-and-rrsig-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10219
ZONEMD needs to be able to digest SIG and RRSIG records. The signer
field can be compressed in SIG so we need to call dns_name_digest().
While for RRSIG the records the signer field is not compressed the
canonical form has the signer field downcased (RFC 4034, 6.2). This
also implies that compare_rrsig needs to downcase the signer field
during comparison.
(cherry picked from commit 006c5990ce)
When a system test is run with the `USE_RR` environment variable set to 1, an `rr` trace is now correctly generated for each instance of `named`.
Closes#5079
Backport of MR !10197
Merge branch 'backport-5079-fix-rr-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10208
when running a system test with the USE_RR environment
variable set to 1, an rr trace is generated for named.
because rr wasn't run using libtool --mode=execute, the
trace would actually be generated for the wrapper script
generated by libtool, not for the actual named binary.
(cherry picked from commit 00d7c7c346)
Change all the non-locked operations on `quota->used` and
`quota->waiting` to "acq/rel" for inter-thread synchronization. Some
loads are left as "relaxed", because they are under a locked mutex
which also provides protection.
Also use relaxed memory ordering for `quota->max` and `quota->soft`,
as done in the main branch; possible ordering issues for these
variables are acceptable.
Closes#5018
Merge branch '5018-quota-memory-ordering-fixes-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10203
Change all the non-locked operations on 'quota->used' and
'quota->waiting' to "acq/rel" for inter-thread synchronization. Some
loads are left as "relaxed", because they are under a locked mutex
which also provides protection.
This commit bumps the total number of active streams (= the opened
streams for which a request is received, but response is not ready) to
60% of the total streams limit.
The previous limit turned out to be too tight as revealed by
longer (≥1h) runs of "stress:long:rpz:doh+udp:linux:*" tests.
(cherry picked from commit eaad0aefe6)
The check, while not active by default, is not valid since the commit
8b8f4d500d.
See 'if (total == 0) { ...' below branch to understand why.
(cherry picked from commit 217a1ebd79)
Previously, the code would try to avoid sending any data regardless of
what it is unless:
a) The flush limit is reached;
b) There are no sends in flight.
This strategy is used to avoid too numerous send requests with little
amount of data. However, it has been proven to be too aggressive and,
in fact, harms performance in some cases (e.g., on longer (≥1h) runs
of "stress:long:rpz:doh+udp:linux:*").
Now, additionally to the listed cases, we also:
c) Flush the buffer and perform a send operation when there is an
outgoing DNS message passed to the code (which is indicated by the
presence of a send callback).
That helps improve performance for "stress:long:rpz:doh+udp:linux:*"
tests.
(cherry picked from commit c5f7968856)
Previously, a function for continuing IO processing on the next UV
tick was introduced (http_do_bio_async()). The intention behind this
function was to ensure that http_do_bio() is eventually called at
least once in the future. However, the current implementation allows
queueing multiple such delayed requests needlessly. There is currently
no need for these excessive requests as http_do_bio() can requeue them
if needed. At the same time, each such request can lead to a memory
allocation, particularly in BIND 9.18.
This commit ensures that the number of enqueued delayed IO processing
requests never exceeds one in order to avoid potentially bombarding IO
threads with the delayed requests needlessly.
(cherry picked from commit 0e1b02868a)
This commit significantly simplifies the code flow in the
http_do_bio() function, which is responsible for processing incoming
and outgoing HTTP/2 data. It seems that the way it was structured
before was indirectly caused by the presence of the missing callback
calls bug, fixed in 8b8f4d500d.
The change introduced by this commit is known to remove a bottleneck
and allows reproducible and measurable performance improvement for
long runs (>= 1h) of "stress:long:rpz:doh+udp:linux:*" tests.
Additionally, it fixes a similar issue with potentially missing send
callback calls processing and hardens the code against use-after-free
errors related to the session object (they can potentially occur).
(cherry picked from commit 0956fb9b9e)
Currently, the ChangeLog file is a dangling symlink pointing to the
removed CHANGES file. Fix the link by pointing to doc/arm/changelog.rst.
(cherry picked from commit de0598cbc3)
29fd756408 replaced "only" with "rules" in
.gitlab-ci.yml but forgot to drop the removal from here, hence the
script was broken.
(cherry picked from commit 6e2272d769)
Backport of MR !10185
Merge branch 'mnowak/do-not-delete-only-keyword-in-generate-tsan-stress-jobs' into 'bind-9.18'
See merge request isc-projects/bind9!10188
29fd756408 replaced "only" with "rules" in
.gitlab-ci.yml but forgot to drop the removal from here, hence the
script was broken.
(cherry picked from commit 6e2272d769)
Execute DNS Shotgun performance tests on the regular MRs and compare the changes they introduce against the MR diff base. The results are evaluated automatically - the shotgun jobs will fail if thresholds for CPU/memory/latency difference is exceeded.
Backport of MR !10127
Merge branch 'backport-nicki/ci-shotgun-eval-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10184
The keyword rules allows more flexible and complex conditions when
deciding whether to create the job and also makes it possible run tweak
variables or job properties depending on arbitraty rules. Since it's
not possible to combine only/except and rules together, replace all
uses of only/except to avoid any potential future issues.
(cherry picked from commit 29fd756408)
If the shotgun tests are executed for MRs, compare it against the MR's
base rather than the previous release. Only fail the job in case the
performance drops (pass on performance improvements).
Note that start_in optimization was removed, since it isn't properly
supported with rules as of February 2025
(https://gitlab.com/gitlab-org/gitlab/-/issues/424203). Without this
optimization, container test images are likely to be re-built
unnecessarily when testing different protocols. A workaround for the
.gitlab-ci.yml exists, but the extra complexity doesn't seem justified.
The container image builds might change or be optimized in the future,
so let's just go with the build duplication for now.
(cherry picked from commit 4214c1e8a7)
The `NS_QUERY_DONE_BEGIN` and `NS_QUERY_DONE_SEND` plugin hooks could cause a reference leak if they returned `NS_HOOK_RETURN` without cleaning up the query context properly.
Closes#2094
Backport of MR !9971
Merge branch 'backport-2094-plugin-reference-leak-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10171
if the NS_QUERY_DONE_BEGIN or NS_QUERY_DONE_SEND hook is
used in a plugin and returns NS_HOOK_RETURN, some of the
cleanup in ns_query_done() can be skipped over, leading
to reference leaks that can cause named to hang on shut
down.
this has been addressed by adding more housekeeping
code after the cleanup: tag in ns_query_done().
(cherry picked from commit c2e4358267)
A change in 6aba56ae8 (checking whether a rejected RRset was identical
to the data it would have replaced, so that we could still cache a
signature) inadvertently introduced cases where processing of a
response would continue when previously it would have been skipped.
Closes#5197
Backport of MR !10157
Merge branch 'backport-5197-cache_name-logic-error-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10159
A change in 6aba56ae8 (checking whether a rejected RRset was identical
to the data it would have replaced, so that we could still cache a
signature) inadvertently introduced cases where processing of a
response would continue when previously it would have been skipped.
(cherry picked from commit d0fd9cbe3b)
With RPZ in use, `named` could terminate unexpectedly because of a race condition when a reconfiguration command was received using `rndc`. This has been fixed.
Closes#5146
Backport of MR !10079
Merge branch 'backport-5146-rpz-reconfig-bug-fix-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10145
After a reconfiguration the old view can be left without a valid
'rpzs' member, because when the RPZ is not changed during the named
reconfiguration 'rpzs' "migrate" from the old view into the new
view, so when a query resumes it can find that 'qctx->view->rpzs'
is NULL which query_resume() currently doesn't expect to happen if
it's recursing and 'qctx->rpz_st' is not NULL.
Fix the issue by adding a NULL-check. In order to not split the log
message to two different log messages depending on whether
'qctx->view->rpzs' is NULL or not, change the message to not log
the RPZ policy's "version" which is just a runtime counter and is
most likely not very useful for the users.
(cherry picked from commit 3ea2fbc238)
Previously, when parsing responses, named incorrectly rejected responses without matching RRSIG records for NSEC/DS/NSEC3 records in the authority section. This rejection, if appropriate, should have been left for the validator to determine and has been fixed.
Closes#5185
Backport of MR !10125
Merge branch 'backport-5185-remove-rrsig-check-from-dns_message_parse-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10143
Checking whether the authority section is properly signed should
be left to the validator. Checking in getsection (dns_message_parse)
was way too early and resulted in resolution failures of lookups
that should have otherwise succeeded.
(cherry picked from commit 83159d0a54)
The cache has been updated so that if new data is rejected - for example, because there was already existing data at a higher trust level - then its covering RRSIG will also be rejected.
Closes#5132
Backport of MR !9999
Merge branch 'backport-5132-improve-cd-behavior-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10135
add a zone with different NS RRsets in the parent and child,
and test resolver and forwarder behavior with and without +CD.
(cherry picked from commit e4652a0444)
Add a new dns_rdataset_equals() function to check whether two
rdatasets are equal in DNSSEC terms.
When an rdataset being cached is rejected because its trust
level is lower than the existing rdataset, we now check to see
whether the rejected data was identical to the existing data.
This allows us to cache a potentially useful RRSIG when handling
CD=1 queries, while still rejecting RRSIGs that would definitely
have resulted in a validation failure.
(cherry picked from commit 6aba56ae89)
The value returned by http_send_outgoing() is not used anywhere, so we
make it not return anything (void). Probably it is an omission from
older times.
(cherry picked from commit 2adabe835a)
When handling outgoing data, there were a couple of rarely executed
code paths that would not take into account that the callback MUST be
called.
It could lead to potential memory leaks and consequent shutdown hangs.
(cherry picked from commit 8b8f4d500d)
This commit changes the way how the number of active HTTP streams is
calculated and allows it to scale with the values of the maximum
amount of streams per connection, instead of effectively capping at
STREAM_CLIENTS_PER_CONN.
The original limit, which is intended to define the pipelining limit
for TCP/DoT. However, it appeared to be too restrictive for DoH, as it
works quite differently and implements pipelining at protocol level by
the means of multiplexing multiple streams. That renders each stream
to be effectively a separate connection from the point of view of the
rest of the codebase.
(cherry picked from commit a22bc2d7d4)
Previously we would limit the amount of incoming data to process based
solely on the presence of not completed send requests. That worked,
however, it was found to severely degrade performance in certain
cases, as was revealed during extended testing.
Now we switch to keeping track of how much data is in flight (or ready
to be in flight) and limit the amount of processed incoming data when
the amount of in flight data surpasses the given threshold, similarly
to like we do in other transports.
(cherry picked from commit 05e8a50818)
When processing a query with the "checking disabled" bit set (CD=1), `named` stores the unvalidated result in the cache, marked "pending". When the same query is sent with CD=0, the cached data is validated, and either accepted as an answer, or ejected from the cache as invalid. This deferred validation was not attempted for DS and DNSKEY records if they had no cached signatures, causing spurious validation failures. We now complete the deferred validation in this scenario.
Also, if deferred validation fails, we now re-query the data to find out whether the zone has been corrected since the invalid data was cached.
Closes#5066
Backport of MR !10104
Merge branch 'backport-5066-fix-strip-dnssec-rrsigs-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10106
If a deferred validation on data that was originally queried with
CD=1 fails, we now repeat the query, since the zone data may have
changed in the meantime.
(cherry picked from commit 04b1484ed8)
When a query is made with CD=1, we store the result in the
cache marked pending so that it can be validated later, at
which time it will either be accepted as an answer or removed
from the cache as invalid. Deferred validation was not
attempted when there were no cached RRSIGs for DNSKEY and
DS. We now complete the deferred validation in this scenario.
(cherry picked from commit 8b900d1808)