Remove exclusive mode when scheduling the zone load, as it is no longer necessary;
data that can be read or written by multiple threads are locked or atomic.
The detection of the post zone DB loading logic has been refactored
to take into account the fact that zone databases may be loaded before the
function scheduling the loads.
Merge branch 'colin/remove-exclusive-zone-load' into 'main'
See merge request isc-projects/bind9!11231
Because the asynchronous loading logic expected all jobs to be scheduled
then to be run (because it used to be scheduled during the exclusive
mode) and because all jobs are scheduled on various threads, there were
random situations where load_zones() would return after the scheduled
DB zone loading actually ran. In such cases, the zl->refs ref counter
in view_loaded() wouldn't go down to 0 and the remaining task to do
once all zones were loaded was never called. In particular,
server->reload_status kept the NAMED_RELOAD_PENDING state.
This problem is fixed by handling zoneload_t as a ref-counted object,
shared between load_zones() and each instance of scheduled zone DB
loading. Its destructor function is actually the content of
view_loaded() in the case the zt->refs went to 0. This ensures a
correct post-loading routine to be called once the last load is done.
The configloading system script attempts multiple `rndc
{reconfig,reload}` commands without ensuring the system left
exclusive mode; which normally raise an RNDC error as the server is
currently reloading already. This used to work because the request was
enqueued while the server was in exclusive mode, and was processed
after the server `reload_status` was reset to `NAMED_RELOAD_DONE`.
Due to the fact the exclusive mode is not retaken after
`apply_configuration()` by `load_zones()`, the scheduling of
pending tasks is changed and, regularly, the RNDC command sent by the
test is processed before `NAMED_RELOAD_DONE` is set. This is the same
kind of issue the views system tests had, solved by
`4b2dcb3128fbd5af4609a5a73aeeee1f93bde237`
Fix the problem by waiting for a log line matching the end of
the reloading phase.
The `reload_status` is set to `NAMED_RELOAD_FAILED` after the log line is
printed about this change. Update `reload_status` first, to avoid
(unlikely) case where a test waiting for this log line would attempt a
RNDC reload query but it would be processed by `named` before the status
is updated.
Remove the exclusive mode when scheduling the zone load right after
(re)loading `named` configuration, as there is no reason anymore to
schedule zone loading while the exclusive lock is held. Data which can
be read or written by multiple threads are locked or atomic.
The prefetch configuration option now enforces boundaries. The configuration (including when using `named-checkconf`) now fails if the trigger (first value) is above 10, and if the eligibility (second optional value) isn't at least six seconds greater than the trigger value.
Merge branch 'colin/prefetch-enforcebounds' into 'main'
See merge request isc-projects/bind9!11243
The prefetch statement now enforces its bounds. The configuration
(including `named-checkconf`) now fails if the trigger (first value) is
above 10, or if the eligibility (second optional value) isn't at least
six seconds more than the trigger value.
Catalog-zones can't be used in a view which is not from the IN class.
This is now enforced as the server won't load (instead of loading
without the catalog-zone) if such configuration is detected. This
configuration error is now also caught by `named-checkconf`.
Merge branch 'colin/catz-enforce-non-in' into 'main'
See merge request isc-projects/bind9!11245
Catalog-zones can't be used in view which are not from the IN class.
This is now enforced as the server won't load (instead of loading
without the catalog-zone). This configuration error is now also caught
by `named-checkconf`.
The `configure_view()` `need_hints` is removed as it this function was
always called with the value `true`.
The `need_hints` wasn't even used in the function. The only thing it was
actually used was to throw a warning which can be done simply in an
`else` condition branch.
Moreoever, in the case of catalog zones and response-policy, it fixes a
possible bug that would affect root zones, as those wouldn't be reverted
back to their previous version in case of the view fails to load
(during a server reconfiguration).
The `rndc -h` command was missing the newly introduced `showconf`
commands. Adding it.
Merge branch 'colin/fix-rndc-usage' into 'main'
See merge request isc-projects/bind9!11246
Do not save the text version of the effective configuration when
`allow-new-zones` is enabled, as in that case the object tree can
be printed on demand, reducing unnecessary memory consumption.
Merge branch 'colin/no-effective-config-as-text-allownewzones' into 'main'
See merge request isc-projects/bind9!11242
Do not save the textual version of the effective configuration when
`allow-new-zones` is enabled, as it can be printed on-demand. This
enable to reduce the memory footprint of ~70MB on huge configurations
(1M zones).
The `dns_zoneflg_t` enum defined multiple possible flags for a zone, but
contains numerous holes (likely from flag removed in the past). This
fixes the holes, and use a bit-shift and decimal notation to make holes
easier to spot.
Merge branch 'colin/remove-zoneflag-holes' into 'main'
See merge request isc-projects/bind9!11189
`dns_zoneflg_t` enum defined multiple possible flags for a zone, but
contains numerous holes (likely from flag removed in the past). This
fixes the holes, and use a bit-shift and decimal notation to make holes
easier to spot.
A `cfg_obj_t` object tree structure takes up considerably more space than the equivalent canonical text. If `allow-new-zones` is disabled and catalog zones are not in use, then we don't need the object tree. By storing the configuration in text format, we can use less memory, and `rndc showconf` and `rndc showzone` still work.
Merge branch 'each-cfg-as-text' into 'main'
See merge request isc-projects/bind9!11236
the effective configuration tree is now detached if allow-new-zones
or catalog-zones aren't enabled in any views. this reduces memory
consumption while still allowing "rndc showconf -effective" to work.
as previously mentioned in commit c65b2868ab, a cfg_obj_t
configuration tree structure takes up considerably more space than
the canonical text. since the zone configuration saved in the zone
object using dns_zone_setcfg() is only currently used for "rndc
showzone", it can be saved as text more efficiently than as an
object tree. (and, if a tree were needed, the text could be
re-parsed quickly; zone configuration text is generally small.)
Detection of implicit cast from a boolean into an int, or an
isc_result_t into a boolean (either in an assignement or return
position).
If such pattern is found, a warning comment is added into the code (and
the CI will fails) so the error can be spotted and manually fixed.
Merge branch 'colin/cocci-detect-iscresult-int-implicit-casts' into 'main'
See merge request isc-projects/bind9!11095
As the implicit cast check print "WARNING: ..." on stderr, add a pattern
to make sure that check-cocci would fails if such warning is found on
stderr. This is generic (not specific like the existing "parse error")
so it should be able to support future Coccinelle spatch warnings.
The `display_rrcomments` is a tri-state (-1, 0, 1) which is (in some
cases) initialized with `state`, a boolean, through an implicit cast.
This was spot by Coccinelle. Remove the implcit cast by explicitly
assigning 0 or 1 to `display_rrcomments` based on `state` value.
Detection of implicit cast from a boolean into an int, or an
isc_result_t into a boolean (either in an assignement or return
position).
If such pattern is found, a warning comment is added into the code (and
the CI will fails) so the error can be spotted and manually fixed.
Add a utility function to check for EDE codes present in the DNS
message. The primary benefit of this helper function is that it
handles the compatibility issues with different dnspython versions
and the actual test code doesn't have to deal with that any more.
Merge branch 'nicki/isctest-check-ede-helper' into 'main'
See merge request isc-projects/bind9!11182
Previously, hasattr("extended_errors") was used as a check to detect a
mimumum required dnspython version in order to only perform the EDE
check if a new-enough dnspython was present. This is now abstracted into
isctest.check.ede().
In order to support dnspython<2.2.0, use isctest.compat.EDECode rather
than using dns.edns.EDECode directly.
Add a utility function to check for EDE options present in the DNS
message. The primary benefit of this helper function is that it
handles the compatibility issues with different dnspython versions
and the actual test code doesn't have to deal with that any more.
Rather than using the convenience .extended_errors() method
introduced in dnspython 2.7.0, iterate over the options and find
EDEOption types, which is supported from 2.2.0 onwards.
To work around the issue of using dns.edns.EDECode to specify EDE codes
in our tests, create an isctest.compat.EDECode wrapper. This can be used
even with dnspython versions prior to 2.2.0 and will simply result in
no-op, since EDE isn't supported in the older dnspython anyway.
When shutting down an fctx, validators can just be canceled
without checking whether there are pending finds.
Merge branch 'ondrej/remove-maybe_cancel_validators' into 'main'
See merge request isc-projects/bind9!11229
A part of the `views` system test attempts to add multiples zones in a
loop, and after each zone being added, reconfig the server.
However, the test didn't take into account the fact that the server
might take a bit more time to reload than the script to move to the next
iteration, and in some case the test was re-requesting the server reload
when it was still reloading.
Since `b49f83a3`, `named` explicitly fails to reload when a load/reload
is pending, which is (unless proved otherwise) the reason of the test
was now randomly failing.
That part of the test is now waiting for the server log message saying
the server has added the new zone and is running. Also, that part of the
test has been rewrote in Python.
Closes#5617
Merge branch '5617-rewrite-reload-view-test' into 'main'
See merge request isc-projects/bind9!11225
A part of the `views` system test attempts to add multiples zones in a
loop, and after each zone being added, reconfig the server.
However, the test didn't take into account the fact that the server
might take a bit more time to reload than the script to move to the next
iteration, and in some case the test was re-requesting the server reload
when it was still reloading.
Since `b49f83a3`, `named` explicitly fails to reload when a load/reload
is pending, which is (unless proved otherwise) the reason of the test
was now randomly failing.
That part of the test is now waiting for the server log message saying
the server has added the new zone and is running. Also, that part of the
test has been rewrote in Python.
Harden `ede24` system test in order to avoid random failures, likely caused by timing issues. Also remove expiration-related dead-code (which should have been done in the original ede24 changes) as well as printing the query ID, as this should be useful to debug further flaky system test issues. (In particular, this one, if the changes made here are not enough).
Closes#5625
Merge branch '5625-fix-ede24-test' into 'main'
See merge request isc-projects/bind9!11217
Because ede24 system tests require stopping/restarting server, there is
always the risk that the test ends (with a failure) with server in an
wrong and impredictible state. This would make the other tests to fail
in a strange way as well.
To avoid this problem, split the test into different modules, so if a
module fails, the other module is not impacted as it uses separate
server instances.
There was a random failure of ede24 system test. While this is still a
bit speculative, the two reasons were:
- in the case of `test_ede24_noloaded` the test might attempt to early
(before the zone actually transfered on the secondary server) to query
ns2.
- still in the case of `test_ede24_noloaded`, even after waiting for
transfer succeed logs, if the CI machine is slow, the zone could be
expired before the request checking the secondary zone works because
the expiration time of the zone was very short (1s). Moving this
expiration time to 3 seconds should be enough (while not making the
test execution too much longer when waiting for the zone expiration).
- in the case of `test_ede24_expired`, the zone expired flag is flipped
and the log message is printed immediately after. However, it is
possible that because the flag is set using a relaxed atomic
operation, another thread process the query and gets the previous
(non-expired) value of the flag. In order to workaround this, the
test now also expects another log written after the zone expiration
(stop timers) on the next UV tick.
Adding the query ID to the query trace message. The log is now as the
following (id is at the end):
query client=0x7f75c5017000 thread=0x7f75c6dfe680(foo.fr/A): \
client attr:0x22300, query attr:0x700, restarts:0, \
origqname:foo.fr, timer:0, authdb:0, referral:0, id:21338
This should help debugging tests, in particular to quickly get a
specific query from the logs.
There is code duplication between `keyfetch` and `nsfetch`, refactor to allow common code paths to differentiate between them. This is in preparation for support of generalized DNS notifications, that will require fetching DSYNC records.
Merge branch 'matthijs-refactor-zone-fetch' into 'main'
See merge request isc-projects/bind9!11176
Scheduling and rescheduling a zonefetch is also similar. Refactor into
zonefetch functions. This also increments and decrements the zone's
internal reference counter in the same module, which may be less
confusing when reading the code.
Not doing this has lead to breakage caused by different dnspython
versions on different platforms only discovered in full nightly
pipelines.
Add a triggering rule for MRs changing code in bin/test/system.
Apply this rule to all nightly-only system test jobs.
Merge branch 'stepan/run-all-system-tests-on-system-test-change' into 'main'
See merge request isc-projects/bind9!11214
Not doing this has lead to breakage caused by different dnspython
versions on different platforms only discovered in full nightly
pipelines.
Add a triggering rule for MRs changing code in bin/test/system.
Apply this rule to all nightly-only system test jobs.
Compare only the dumped configuration as the `cfg_printx` does not NULL-terminate the configuration strings.
Merge branch 'colin/fix-parser-test' into 'main'
See merge request isc-projects/bind9!11215