After !9950, respdiff's maximal disagreement percentage needs to be
adjusted as target disagreements between the tested version of the
"main" branch and the reference one jumped for the respdiff,
respdiff:asan, and respdiff:tsan jobs from on average 0.07% to 0.16% and
from 0.12% to 0.17% for the respdiff-third-party job.
In !9950, we concluded setting MAX_DISAGREEMENTS_PERCENTAGE to double
the average disagreement percentage works fine in the CI.
The false positive rate is about 10-20 % when evaluating shotgun results
from a single run. Attempt to reduce the false positive rate by allowing
a re-run of failed jobs.
While there is a slight risk that barely noticable decreases in
performance might slip by more easily in MRs, they'd still likely pop up
during nightly or pre-release testing.
Also increase the tolerance threshold for DoH latency comparisons, as
those tests often experience increased jitter in the tail end latencies.
The vulture tool seems to be unable to follow how the parent classes
defined in bin/tests/system/qmin/qmin_ans.py use mandatory properties
specified by child classes in bin/tests/system/qmin/ans*/ans.py. Make
the tool ignore not just ans.py servers, but also *_ans.py utility
modules above the ansX/ subdirectories to prevent false positives about
unused code from causing CI pipeline failures.
Disabling new dynamic ELF tags ensures the Clang symbolizer creates
valid TSAN reports. For consistency, also add the option to gcc:tsan so
they are both on the same footing.
The keyword rules allows more flexible and complex conditions when
deciding whether to create the job and also makes it possible run tweak
variables or job properties depending on arbitraty rules. Since it's
not possible to combine only/except and rules together, replace all
uses of only/except to avoid any potential future issues.
If the shotgun tests are executed for MRs, compare it against the MR's
base rather than the previous release. Only fail the job in case the
performance drops (pass on performance improvements).
Note that start_in optimization was removed, since it isn't properly
supported with rules as of February 2025
(https://gitlab.com/gitlab-org/gitlab/-/issues/424203). Without this
optimization, container test images are likely to be re-built
unnecessarily when testing different protocols. A workaround for the
.gitlab-ci.yml exists, but the extra complexity doesn't seem justified.
The container image builds might change or be optimized in the future,
so let's just go with the build duplication for now.
We need to avoid double-triggering of post-merge jobs in the following
scenario:
1. A private MR gets merged into the private BIND 9 repository.
2. This merge operation triggers a "push" pipeline in the private
repository, which correctly runs post-merge jobs, e.g. to set MR
metadata in the private project.
3. When a release is published, a script is run to change the
automatically assigned milestone value ("Not released yet") to
something else.
4. Shortly afterwards, the result of the merge from step 1 is merged
back into a maintenance branch in the public repository.
5. The push operation triggers another "push" pipeline, this time in
the public project.
At this point there are two problems:
- If the script is dumb (like it currently is), it will extract the
merge request ID from the merge commit description and change the
milestone for a merge request in the wrong project namespace.
- Even if the script was fixed to extract and use the correct GitLab
project reference, it would reset the milestone for the merge
request in the private repository back to "Not released yet" - while
the milestone set in step 3 should be retained.
An alternative would be to change the order of operations so that
post-release milestoning happens at a later stage, while also fixing the
script to correctly follow cross-project references, but that approach
seems more fragile than simply failing on all cross-project pushes. The
rule to enforce is: each project should only take care of its own
post-merge tasks.
With shallow fetching working reliably in pygit2 1.17.0+, there is no
longer any need for GitLab CI runners to clone the BIND 9 repository
with a fixed depth of 1000 during every "danger" CI job as Hazard is now
able to fetch remote refs with an arbitrary depth, controlled by the
HAZARD_FETCH_DEPTH environment variable. The latter can be defined via
GitLab project's CI settings and adjusted as needed over time, without
the need to update .gitlab-ci.yml every time its value needs to be
changed.
Reduce the amount of artifacts stored by running make clean at the end
of unit and system test run. If any of the previous commands fail, the
runner will stop executing the commands in `script` immediately, so the
cleanup only happens if none of the previous commands failed.
The build artifacts from unit and system tests are re-used anywhere and
should be safe to throw away immediately. Same for respdiff.
The prior regex didn't match the actual names we use for release
branches in the private repo. This caused the merged-metadata job to not
be created upon merging to a release branch, resulting in the private MR
not being properly milestoned.
Use the correct regex along with protecting the v9.*-release branches in
the gitlab UI so that they have access to the token used to perform the
required API operations.
Add DoH and DoT stress test jobs. The DoH scenario on FreeBSD is omitted
because all Flamethrower's DoH queries timeout on this platform.
Since the response rate of DoT queries is lower than that of DoH and
TCP, the expected TCP response rate is 80%.
Due to the large number of similar stress test configurations, the
"util/generate-stress-test-configs.py" script now generates them as part
of a downstream pipeline. The script is expected to be run exclusively
within the CI environment, which sources all environmental variables and
files.
This refactoring brought the following changes:
- To start a stress test immediately and not wait for artifacts of the
autoreconf job, run the "autoreconf -fi" command as part of every job.
- Drop the BIND_STRESS_TEST_* variables as they were rarely used and
conflicted with mode and platform selection in the configuration
generator.
- Most pipelines now include a few short, randomly selected stress test
jobs. To schedule all stress tests, set the ALL_BIND_STRESS_TESTS
environmental variable, push a tag to CI, or run a scheduled pipeline.
- Set the BIND_STRESS_TESTS_RUN_TIME environmental variable to pick the
stress test runtime of your choosing, set the BIND_STRESS_TESTS_RATE
environmental variable to set different than the default query rate.
- Job timeout is set to 30 minutes plus stress test runtime in minutes.
The changelog job is supposed to test that the text from GitLab MR
title&description is valid rst syntax and can be built with sphinx. In
49128fc1, the way gitchangelog generates entries was changed - it no
longer writes to the changelog file, but generates output on stdout
instead. Ensure the generated notes is actually written to (some)
rendered file which is part of the docs so that the subsequent sphinx
build attempts to render the note.
The base image tends to have a rather old dnspython version and when
used with pylint and mypy it produces errors about newer dnspython
features the old version does not know about.
$ mypy "bin/tests/system/isctest/"
bin/tests/system/isctest/query.py:55: error: Unexpected keyword argument "verify" for "tls" [call-arg]
/usr/lib/python3/dist-packages/dns/query.py:958: note: "tls" defined here
$ pylint --rcfile $CI_PROJECT_DIR/.pylintrc --disable=wrong-import-position $(git ls-files 'bin/tests/system/*.py' | grep -vE 'ans\.py')
************* Module isctest.query
bin/tests/system/isctest/query.py:55:11: E1123: Unexpected keyword argument 'verify' in function call (unexpected-keyword-arg)
Adjust the limit of maximum disagreements in respdiff results based on
recent pipeline results.
The respdiff and respdiff:asan seem to have almost identical results,
typically around 0.07 % of differences with ocassional spikes up to
around 0.11 %. Similar results are for respdiff:tsan, perhaps with more
common spikes with values up to around 0.12 %. Set the limit to 0.15 %
to allow for some tolerance due to network conditions, time of day etc.
The respdiff:third-party has a slightly higher disagreements average,
with typical values being around 0.12 %. Set the limit to 0.2 %.
Exceeding either of those values should be quite clear indication that
some resolution behaviour has changed, since the values appear to be
very stable within the newly configured limits.
gcovr has issues with processing files produced as part of a BIND 9
build with tracing support enabled (--enable-tracing). Depending on the
gcovr version used, these issues may result in either warnings or
failures being reported by that tool. Disable tracing support for
gcovr-enabled builds to work around these issues.
The December releases suffer from the ns2/managed1.conf file not being
in the mkeys extra_artifacts. This manifests only when pytest is run
with the --setup-only option, which is the case in the
cross-version-config-tests CI job. The original issue is fixed in !9815,
but the fix will be effective only when subsequent releases are out.
The #4666 issue removed the "fixed" value for the "rrset-order" option
which is still present in the December release system test and which the
current named can't handle. This will be addressed when when the January
9.21 release is published.
The #4482 issue removed the "dnssec-must-be-secure" feature.
The DLZ modules are poorly maintained as we only ensure they can still
be compiled, the DLZ interface is blocking, so anything that blocks the
query to the database blocks the whole server and they should not be
used except in testing. The DLZ interface itself should be scheduled
for removal.
The cross-version-config-tests job has never functioned in CI because
the testing framework changed after the testing was completed. To run
the new "named" binary using the old configurations, paths in the test
framework must be updated to point to the location of the new binaries.
QPDB is now a default implementation for both cache and zone. Remove
the venerable RBTDB database implementation, so we can fast-track the
changes to the database without having to implement the design changes
to both QPDB and RBTDB and this allows us to be more aggressive when
refactoring the database design.
With the introduction of the generated changelog, the CHANGES file
became a symlink to doc/arm/changelog.rst. After the changes made in
!9549, the changelog file transitioned from being a wholly generated
file to one that includes versioned changelog files, which are
themselves generated. However, while implementing !9549, we overlooked
that the CHANGES file is copied to a release directory on an FTP server
and contains just "include" directives, not the changelog itself.
Therefore, in the same fashion as the "RELEASE-NOTES*.html" file, create
a "CHANGELOG*.html" file that redirects to the Changelog appendix of the
ARM.
Use a different timezone via the TZ variable in at least one of the
system test jobs in order to detect possible issues with timezone
handling in python.
The unit test doh_test tends do fail quite often due to exceeding run
time limit in the unit:clang:freebsd14:amd64 job. Use a retry on gitlab
level to alleviate the issue until a better fix is available.
Due to the recent improvements to the TCP processing, much higher loads
can be handled by BIND9 without causing client timeouts. The updated
parameters give us useful data for both cold and hot cache testing.
DNSRPS was the API for a commercial implementation of Response-Policy
Zones that was supposedly better. However, it was never open-sourced
and has only ever been available from a single vendor. This goes against
the principle that the open-source edition of BIND 9 should contain only
features that are generally available and universal.
This commit removes the DNSRPS implementation from BIND 9. It may be
reinstated in the subscription edition if there's enough interest from
customers, but it would have to be rewritten as a plugin (hook) instead
of hard-wiring it again in so many places.
The cross-version-config-tests job fails when a system test is removed
from the upcoming release. To avoid this, remove the system test also
from the $BIND_BASELINE_VERSION.