This commit simplifies code flow in the tls_cycle_input() and makes
the incoming data processing similar to that in TCP DNS. In
particular, now we decipher all the the incoming data before making a
single isc__nm_process_sock_buffer() call. Previously we would try to
decipher data bit-by-bit before trying to process the deciphered bit
via isc__nm_process_sock_buffer(). Doing like before made the code
much less predictable, in particular in the areas like when reading is
paused or resumed.
The newer approach also allowed us to get rid of some old kludges.
The false positive rate is about 10-20 % when evaluating shotgun results
from a single run. Attempt to reduce the false positive rate by allowing
a re-run of failed jobs.
Backport of MR !10271
Merge branch 'backport-nicki/ci-shotgun-reduce-false-positives-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10280
The false positive rate is about 10-20 % when evaluating shotgun results
from a single run. Attempt to reduce the false positive rate by allowing
a re-run of failed jobs.
While there is a slight risk that barely noticable decreases in
performance might slip by more easily in MRs, they'd still likely pop up
during nightly or pre-release testing.
Also increase the tolerance threshold for DoH latency comparisons, as
those tests often experience increased jitter in the tail end latencies.
(cherry picked from commit 5eab352478)
When `-T cookiealwaysvalid` is passed to `named`, DNS cookie checks for
the incoming queries always pass, given they are structurally correct.
Backport of MR !10232
Merge branch 'backport-aram/new-named-minus-T-option-of-cookiealwaysvalid-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10265
When -T cookiealwaysvalid is passed to named, DNS cookie checks for
the incoming queries always pass, given they are structurally correct.
(cherry picked from commit 807ef8545d)
Add missing locks in dns_zone_getxfrsource4 et al. Addresses CID 468706, 468708, 468741, 468742, 468785, and 468778.
Cleanup dns_zone_setxfrsource4 et al to now return void.
Remove double copies with dns_zone_getprimaryaddr and dns_zone_getsourceaddr.
Closes#4933
Backport of MR !9485
Merge branch 'backport-4933-add-missing-locks-when-returning-addresses-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10260
Add missing locks in dns_zone_getxfrsource4 et al. Addresses CID
468706, 468708, 468741, 468742, 468785 and 468778.
Cleanup dns_zone_setxfrsource4 et al to now return void.
Remove double copies with dns_zone_getprimaryaddr and dns_zone_getsourceaddr.
(cherry picked from commit d0a59277fb)
The `I:checking that lifting the limit will allow everything to get
cached (20)` test was failing due to the TTL of the records being
too short for the elapsed time of the test. Raise the TTL to fix
this and adjust other tests as needed.
Closes#5206
Backport of MR !10177
Merge branch 'backport-5206-tune-last-sub-test-of-reclimit-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10250
The 'I:checking that lifting the limit will allow everything to get
cached (20)' test was failing due to the TTL of the records being
too short for the elapsed time of the test. Raise the TTL to fix
this and adjust other tests as needed.
(cherry picked from commit 1a58bd2113)
From technical reasons --with-readline=libedit is not being tested on
FreeBSD anymore as it's hard to have anchors both unified and specific.
(cherry picked from commit e0df774ca0)
ZONEMD digests RRSIG records and potentially digests SIG record. Add digests
methods for both record types.
Closes#5219
Backport of MR !10217
Merge branch 'backport-5219-add-digest-methods-for-sig-and-rrsig-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10219
ZONEMD needs to be able to digest SIG and RRSIG records. The signer
field can be compressed in SIG so we need to call dns_name_digest().
While for RRSIG the records the signer field is not compressed the
canonical form has the signer field downcased (RFC 4034, 6.2). This
also implies that compare_rrsig needs to downcase the signer field
during comparison.
(cherry picked from commit 006c5990ce)
When a system test is run with the `USE_RR` environment variable set to 1, an `rr` trace is now correctly generated for each instance of `named`.
Closes#5079
Backport of MR !10197
Merge branch 'backport-5079-fix-rr-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10208
when running a system test with the USE_RR environment
variable set to 1, an rr trace is generated for named.
because rr wasn't run using libtool --mode=execute, the
trace would actually be generated for the wrapper script
generated by libtool, not for the actual named binary.
(cherry picked from commit 00d7c7c346)
Change all the non-locked operations on `quota->used` and
`quota->waiting` to "acq/rel" for inter-thread synchronization. Some
loads are left as "relaxed", because they are under a locked mutex
which also provides protection.
Also use relaxed memory ordering for `quota->max` and `quota->soft`,
as done in the main branch; possible ordering issues for these
variables are acceptable.
Closes#5018
Merge branch '5018-quota-memory-ordering-fixes-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10203
Change all the non-locked operations on 'quota->used' and
'quota->waiting' to "acq/rel" for inter-thread synchronization. Some
loads are left as "relaxed", because they are under a locked mutex
which also provides protection.
This commit bumps the total number of active streams (= the opened
streams for which a request is received, but response is not ready) to
60% of the total streams limit.
The previous limit turned out to be too tight as revealed by
longer (≥1h) runs of "stress:long:rpz:doh+udp:linux:*" tests.
(cherry picked from commit eaad0aefe6)
The check, while not active by default, is not valid since the commit
8b8f4d500d.
See 'if (total == 0) { ...' below branch to understand why.
(cherry picked from commit 217a1ebd79)
Previously, the code would try to avoid sending any data regardless of
what it is unless:
a) The flush limit is reached;
b) There are no sends in flight.
This strategy is used to avoid too numerous send requests with little
amount of data. However, it has been proven to be too aggressive and,
in fact, harms performance in some cases (e.g., on longer (≥1h) runs
of "stress:long:rpz:doh+udp:linux:*").
Now, additionally to the listed cases, we also:
c) Flush the buffer and perform a send operation when there is an
outgoing DNS message passed to the code (which is indicated by the
presence of a send callback).
That helps improve performance for "stress:long:rpz:doh+udp:linux:*"
tests.
(cherry picked from commit c5f7968856)
Previously, a function for continuing IO processing on the next UV
tick was introduced (http_do_bio_async()). The intention behind this
function was to ensure that http_do_bio() is eventually called at
least once in the future. However, the current implementation allows
queueing multiple such delayed requests needlessly. There is currently
no need for these excessive requests as http_do_bio() can requeue them
if needed. At the same time, each such request can lead to a memory
allocation, particularly in BIND 9.18.
This commit ensures that the number of enqueued delayed IO processing
requests never exceeds one in order to avoid potentially bombarding IO
threads with the delayed requests needlessly.
(cherry picked from commit 0e1b02868a)
This commit significantly simplifies the code flow in the
http_do_bio() function, which is responsible for processing incoming
and outgoing HTTP/2 data. It seems that the way it was structured
before was indirectly caused by the presence of the missing callback
calls bug, fixed in 8b8f4d500d.
The change introduced by this commit is known to remove a bottleneck
and allows reproducible and measurable performance improvement for
long runs (>= 1h) of "stress:long:rpz:doh+udp:linux:*" tests.
Additionally, it fixes a similar issue with potentially missing send
callback calls processing and hardens the code against use-after-free
errors related to the session object (they can potentially occur).
(cherry picked from commit 0956fb9b9e)
Currently, the ChangeLog file is a dangling symlink pointing to the
removed CHANGES file. Fix the link by pointing to doc/arm/changelog.rst.
(cherry picked from commit de0598cbc3)
29fd756408 replaced "only" with "rules" in
.gitlab-ci.yml but forgot to drop the removal from here, hence the
script was broken.
(cherry picked from commit 6e2272d769)
Backport of MR !10185
Merge branch 'mnowak/do-not-delete-only-keyword-in-generate-tsan-stress-jobs' into 'bind-9.18'
See merge request isc-projects/bind9!10188
29fd756408 replaced "only" with "rules" in
.gitlab-ci.yml but forgot to drop the removal from here, hence the
script was broken.
(cherry picked from commit 6e2272d769)
Execute DNS Shotgun performance tests on the regular MRs and compare the changes they introduce against the MR diff base. The results are evaluated automatically - the shotgun jobs will fail if thresholds for CPU/memory/latency difference is exceeded.
Backport of MR !10127
Merge branch 'backport-nicki/ci-shotgun-eval-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10184
The keyword rules allows more flexible and complex conditions when
deciding whether to create the job and also makes it possible run tweak
variables or job properties depending on arbitraty rules. Since it's
not possible to combine only/except and rules together, replace all
uses of only/except to avoid any potential future issues.
(cherry picked from commit 29fd756408)
If the shotgun tests are executed for MRs, compare it against the MR's
base rather than the previous release. Only fail the job in case the
performance drops (pass on performance improvements).
Note that start_in optimization was removed, since it isn't properly
supported with rules as of February 2025
(https://gitlab.com/gitlab-org/gitlab/-/issues/424203). Without this
optimization, container test images are likely to be re-built
unnecessarily when testing different protocols. A workaround for the
.gitlab-ci.yml exists, but the extra complexity doesn't seem justified.
The container image builds might change or be optimized in the future,
so let's just go with the build duplication for now.
(cherry picked from commit 4214c1e8a7)
The `NS_QUERY_DONE_BEGIN` and `NS_QUERY_DONE_SEND` plugin hooks could cause a reference leak if they returned `NS_HOOK_RETURN` without cleaning up the query context properly.
Closes#2094
Backport of MR !9971
Merge branch 'backport-2094-plugin-reference-leak-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10171
if the NS_QUERY_DONE_BEGIN or NS_QUERY_DONE_SEND hook is
used in a plugin and returns NS_HOOK_RETURN, some of the
cleanup in ns_query_done() can be skipped over, leading
to reference leaks that can cause named to hang on shut
down.
this has been addressed by adding more housekeeping
code after the cleanup: tag in ns_query_done().
(cherry picked from commit c2e4358267)
A change in 6aba56ae8 (checking whether a rejected RRset was identical
to the data it would have replaced, so that we could still cache a
signature) inadvertently introduced cases where processing of a
response would continue when previously it would have been skipped.
Closes#5197
Backport of MR !10157
Merge branch 'backport-5197-cache_name-logic-error-9.18' into 'bind-9.18'
See merge request isc-projects/bind9!10159