The signatures-refresh should not near the signatures-validity value,
to prevent operational instability. Same is true when checking against
signatures-validity-dnskey.
Previously, tasks could be created either unbound or bound to a specific
thread (worker loop). The unbound tasks would be assigned to a random
thread every time isc_task_send() was called. Because there's no logic
that would assign the task to the least busy worker, this just creates
unpredictability. Instead of random assignment, bind all the previously
unbound tasks to worker 0, which is guaranteed to exist.
Give a little bit more time if we wait on a time out from the
authoritative (aka resolver failure), and give up after one try
(because the second attempt will likely result in a different EDE).
When there is no time in a key file, `dnssec-settime` will print
"UNSET", but to unset a time the user must specify "none" or "never".
This change allows "unset" or "UNSET" as well as "none" or "never".
The "UNSET" output remains the same to avoid compatibility problems
with wrapper scripts.
I have also re-synchronized the "Timing Options" sections of the man
pages.
The dns_message_gettempname(), dns_message_gettemprdata(),
dns_message_gettemprdataset(), and dns_message_gettemprdatalist() always
succeeds because the memory allocation cannot fail now. Change the API
to return void and cleanup all the use of aforementioned functions.
7249bad7 introduced the -c option to stat(1) command, but BSD systems
do not know about it. Replace the stat(1) command with a PERL script
that achieves the same.
Why PERL? For consistency purposes, there are more places in the
system test where we use the same method.
Check that the recursing client count is above a reasonable
minimum, as well as below a maximum, so that we can detect
bugs that cause recursion to fail too early or too often.
The fetchlimit test depends on a resolver continuing to try UDP
and timing out while the client waits for resolution to succeed.
but since commit bb990030 (flag day 2020), a fetch will always
switch to TCP after two timeouts, unless EDNS was disabled for
the query.
This commit adds "edns no;" to server statements in the fetchlimit
resolver, to restore the behavior expected by the test.
Add a test case that triggers a keymgr run that will not trigger any
metadata changes. Ensure that the last status change of the key files
is unmodified.
After removing the isc_task_onshutdown(), the isc_task_shutdown() and
isc_task_destroy() became obsolete.
Remove calls to isc_task_shutdown() and replace the calls to
isc_task_destroy() with isc_task_detach().
Simplify the internal logic to destroy the task when the last reference
is removed.
RPZ NSIP and NSDNAME checks were failing with "unrecognized NS
rpz_rrset_find() failed: glue" when static or static-stub zones
where used to resolve the query name.
Add tests using stub and static-stub zones that are expected to
be filtered and not-filtered against NSIP and NSDNAME rules.
stub and static-stub queries are expected to be filtered
stub-nomatch and static-stub-nomatch queries are expected to be passed
The dig commands appear to be failing unexpectedly on some platforms
when rate limiting kicks in and the response is dropped. Correct
behaviour should be for dig to retry the query. Set +qr and capture
stdout and stderr of each of the dig commands involved.
In '_check_apex_dnskey' we check for each key (KEY1 to KEY4) if they
are present in the DNSKEY RRset if they should be.
However, we only grep the dig output for the first seven fields (owner,
ttl, class, type, flags, protocol, algorithm). This can be the same
for different keys.
For example, KEY1 may be KSK predecessor and KEY2 a KSK successor,
both DNSKEY records for these keys are the same up to the public key
field. This can cause test failures if KEY1 needs to be present, but
KEY2 not, because when grepping for KEY2 we will falsely detect the
key to be present (because the grep matches KEY1).
Fix the function by grepping looking for the first seven fields in the
corresponding key file and retrieve the public key part. Grep for this
in the dig output.
The DNS catalog zones draft version 5 document requires that catalog
zones consumers must reset the member zone's internal zone state when
its unique label changes (either within the same catalog zone or
during change of ownership performed using the "coo" property).
BIND already behaves like that, and, in fact, doesn't support keeping
the zone state during change of ownership even if the unique label
has been kept the same, because BIND always removes the member zone
and adds it back during unique label renaming or change of ownership.
Document the described behavior and add a log message to inform when
unique label renaming occurs.
Add a system test case with unique label renaming.
There is already a check for the missing version property case
(catalog-bad1.example), and this new test should result in the same
outcome, but differs in a way that there exists a version record in the
zone, but it is of a wrong type (A instead of the expected TXT).
According to DNS catalog zones draft version 5 document, the CLASS field
of every RR in a catalog zone MUST be IN.
Add a new check in the catz system test to verify that a non-IN class
catalog zone (in this case CH) fails to load.
BIND does not support having a non-IN class RR in an IN class zone, or
non-IN class zone in an IN class view, so to verify that BIND respects
the mentioned restriction we must try to add a non-IN class catalog
zone and check that it didn't succeed.
The `named` configuration files had to be restructured to put all the
zones inside views, which also resulted in some corresponding changes
in the tests.sh script.
The DNS catalog zones draft version 5 document describes various
situations when a catalog zones must be considered as "broken" and
not be processed.
Implement those checks in catz.c and add corresponding system tests.
Add DNS extended errors 3 (Stale Answer) and 19 (Stale NXDOMAIN Answer)
to responses. Add extra text with the reason why the stale answer was
returned.
To test, we need to change the configuration such that for the first
set of tests the stale-refresh-time window does not interfer with the
expected extended errors.
PyLint 2.13.7 reports the following error:
bin/tests/system/doth/conftest.py:34:28: E0601: Using variable 'stderr' before assignment (used-before-assignment)
The reason the current code has not caused problems before is that
invoking gnutls-cli with just the --logfile=/dev/null argument causes it
to always return with a non-zero exit code, either due to the option not
being supported or due to the hostname argument not being provided. In
other words, the 'except' branch has always been taken. PyLint is
obviously right on a syntactical level, though.
Instead of relying on a less than obvious code flow (where the 'except'
branch is always taken), rework the flagged code by employing
subprocess.run(..., check=False) instead of subprocess.check_output(),
making exception handling redundant.
While this issue was investigated, it was also noticed that
subprocess.check_output() was incorrectly used as a context manager:
Popen objects are context managers, but subprocess.check_output() and
subprocess.run() are not. Fix by dropping the relevant 'with'
statement.
Commit 3ec5d2d6ed added a Python-based
name server (bin/tests/system/digdelv/ans8/ans.py) to the "digdelv"
system test, but did not update bin/tests/system/Makefile.am to ensure
Python is present in the test environment before the "digdelv" system
test is run. Update bin/tests/system/Makefile.am to enforce that
requirement.
There were two problems in the notify system test when it waited for
log messages to appear: the shellcheck refactoring introduced a call
to `wait_for_log` with a regex, but `wait_for_log` only supports fixed
strings, so it always ran for the full 45 second timeout; and the new
test to ensure that notify messages time out failed to reset the
nextpart pointer, so if the notify messages timed out before the test
ran, it would fail to see them.
This change adds a `wait_for_log_re` helper that matches a regex, and
uses it where appropriate in the notify system test, which stops the
test from waiting longer than necessary; and it resets the nextpart
pointer so that the notify timeout test works reliably.
Closes#3275
Prime the cache with a negative cache DS entry then make a query for
name beneath that entry. This will cause the DS entry to be retieved
as part of the validation process. Each RRset in the ncache entry
will be validated and the trust level for each will be updated.
Catalog zones change of ownership is special mechanism to facilitate
controlled migration of a member zone from one catalog to another.
It is implemented using catalog zones property named "coo" and is
documented in DNS catalog zones draft version 5 document.
Implement the feature using a new hash table in the catalog zone
structure, which holds the added "coo" properties for the catalog zone
(containing the target catalog zone's name), and the key for the hash
table being the member zone's name for which the "coo" property is being
created.
Change some log messages to have consistent zone name quoting types.
Update the ARM with change of ownership documentation and usage
examples.
Add tests which check newly the added features.
According to DNS catalog zones draft version 5 document, catalog
zone custom properties must be placed under the "ext" label.
Make necessary changes to support the new custom properties syntax in
catalog zones with version "2" of the schema.
Change the default catalog zones schema version from "1" to "2" in
ARM to prepare for the new features and changes which come starting
from this commit in order to support the latest DNS catalog zones draft
document.
Make some restructuring in ARM and rename the term catalog zone "option"
to "custom property" to better reflect the terms used in the draft.
Change the version of 'catalog1.zone.' catalog zone in the "catz" system
test to "2", and leave the version of 'catalog2.zone.' catalog zone at
version "1" to test both versions.
Add tests to check that the new syntax works only with the new schema
version, and that the old syntax works only with the legacy schema
version catalog zones.
Add a test case for a dynamically added CDS DELETE record and make
sure it is not removed when signing the zone. This happens because
BIND maintains CDS and CDNSKEY publishing and it will only allow
CDS DELETE records if the zone is transitioning to insecure. This is
a state that can be identified when using KASP through 'dnssec-policy',
but not when using 'auto-dnssec'.
Commit bf3fffff67 added a Python-based
name server (bin/tests/system/forward/ans11/ans.py) to the "forward"
system test, but did not update bin/tests/system/Makefile.am to ensure
Python is present in the test environment before the "forward" system
test is run. Update bin/tests/system/Makefile.am to enforce that
requirement.
Implement TCP support in the `ans11` Python-based DNS server.
Implement a control command channel in `ans11` to support an optional
silent mode of operation, which, when enabled, will ignore incoming
queries.
In the added check, make the `ans11` the NS server of
"a.root-servers.nil." for `ns3`, so it uses `ans11` (in silent mode)
for the regular (non-forwarded) name resolutions.
This will trigger the "hung fetch" scenario, which was causing `named`
to crash.
- Check that an NS in an authority section returned from a forwarder
which is above the name in a configured "forward first" or "forward
only" zone (i.e., net/NS in a response from a forwarder configured for
local.net) is not cached.
- Test that a DNAME for a parent domain will not be cached when sent
in a response from a forwarder configured to answer for a child.
- Check that glue is rejected if its name falls below that of zone
configured locally.
- Check that an extra out-of-bailiwick data in the answer section is
not cached (this was already working correctly, but was not explicitly
tested before).
Add a test case to check for lingering TCP sockets stuck in the
CLOSE_WAIT state. This can happen if a client sends some garbage after
its first query.
The system test runs the reproducer script and then sends another TCP
query to the resolver. The resolver is configured to allow one TCP
client only. If BIND has its TCP socket stuck in CLOSE_WAIT, it does
not have the resources available to answer the second query.
Note: A better test would be to check if the named daemon does not
have a TCP socket stuck in CLOSE_WAIT for example with netstat. When
running this test locally you can examine named with netstat manually.
But since netstat is platform specific it is not a good candidate to do
this as a system test.
If you, if you could return, don't let it burn.
Do you have to let it linger?
- Cranberries
There are a couple of problems with dns_request_createvia(): a UDP
retry count of zero means unlimited retries (it should mean no
retries), and the overall request timeout is not enforced. The
combination of these bugs means that requests can be retried forever.
This change alters calls to dns_request_createvia() to avoid the
infinite retry bug by providing an explicit retry count. Previously,
the calls specified infinite retries and relied on the limit implied
by the overall request timeout and the UDP timeout (which did not work
because the overall timeout is not enforced). The `udpretries`
argument is also changed to be the number of retries; previously, zero
was interpreted as infinity because of an underflow to UINT_MAX, which
appeared to be a mistake. And `mdig` is updated to match the change in
retry accounting.
The bug could be triggered by zone maintenance queries, including
NOTIFY messages, DS parental checks, refresh SOA queries and stub zone
nameserver lookups. It could also occur with `nsupdate -r 0`.
(But `mdig` had its own code to avoid the bug.)
In `send_udp()` and `launch_next_query()` functions, when calling
`dighost_printmessage()` to print detailed information about the
sent query, dig always prints the data of the first query in the
lookup's queries list.
The first query in the list can be already finished, having its handles
freed, and accessing this information results in assertion failure.
Print the current query's information instead.
In recv_done(), when dig decides to start the lookup's next query in
the line using `start_udp()` or `start_tcp()`, and for some reason,
no queries get started, dig doesn't cancel the lookup.
This can occur, for example, when there are two queries in the lookup,
one with a regular IP address, and another with a IPv4 mapped IPv6
address. When the regular IP address fails to serve the query, its
`recv_done()` callback starts the next query in the line (in this
case the one with a mapped IP address), but because `dig` doesn't
connect to such IP addresses, and there are no other queries in the
list, no new queries are being started, and the lookup keeps hanging.
After calling `start_udp()` or `start_tcp()` in `recv_done()`, check
if there are no pending/working queries then cancel the lookup instead
of only detaching from the current query.
After switching to per-thread resources in the zonemgr, the performance
was decreased because the memory context, zonetask and loadtask was
picked from the pool at random.
Pin the zone to single threadid (.tid) and align the memory context,
zonetask and loadtask to be the same, this sets the hard affinity of the
zone to the netmgr thread.
a test case in the 'resolver' system test was reliant on
logged output that would only be present when query tracing
was enabled, as in developer builds. that test case is now
disabled when query tracing is not available. Thanks to
Anton Castelli.
the line "$GENERATE 19-28/2147483645 $ CNAME x" should generate
a single CNAME with the owner "19.example.com", but prior to the
overflow bug it generated several CNAMEs, half of them with large
negative values.
we now test for the bugfix by using "named-checkzone -D" and
grepping for a single CNAME in the output.
Ensure the update zone name is mentioned in the NOTAUTH error message
in the server log, so that it is easier to track down problematic
update clients. There are two cases: either the update zone is
unrelated to any of the server's zones (previously no zone was
mentioned); or the update zone is a subdomain of one or more of the
server's zones (previously the name of the irrelevant parent zone was
misleadingly logged).
Closes#3209
The use of isc_appctx_t in dns_client was used to wait for
dns_client_startresolve() to finish the processing (the resolve_done()
task callback).
This has been replaced with standard bool+cond+lock combination removing
the need of isc_appctx_t altogether.
Previously, the isc_ht API would always take the key as a literal input
to the hashing function. Change the isc_ht_init() function to take an
'options' argument, in which ISC_HT_CASE_SENSITIVE or _INSENSITIVE can
be specified, to determine whether to use case-sensitive hashing in
isc_hash32() when hashing the key.