If we are updating the lifetime, and it was not set before, also
set/update the Retired and Removed timing metadata.
(cherry picked from commit 3e836a87e6)
The dnssec-keygen command for the ZSK generation for the zone
multisigner-model2.kasp was wrong (no ZSK was generated in the setup
script, but when 'named' is started, the missing ZSK was created
anyway by 'dnssec-policy'.
(cherry picked from commit b93cb2e80e)
Only set the next time the keymgr should run if the value is non zero.
Otherwise we default back to one hour. This may happen if there is one
or more key with an unlimited lifetime.
(cherry picked from commit 6c6b8796d3)
The keymgr never set the expected timing metadata when CDS/CDNSKEY
records for the corresponding key will be removed from the zone. This
is not troublesome, as key states dictate when this happens, but with
the new pytest we use the timing metadata to determine if the CDS and/or
CDNSKEY for the given key needs to be published.
(cherry picked from commit 8c9d2eb2bf)
There are a couple of cases where the safety intervals are added
inappropriately:
1. When setting the PublishCDS/SyncPublish timing metadata, we don't
need to add the publish-safety value if we are calculating the time
when the zone is completely signed for the first time. This value
is for when the DNSKEY has been published and we add a safety
interval before considering the DNSKEY omnipresent.
2. The retire-safety value should only be added to ZSK rollovers if
there is an actual rollover happening, similar to adding the sign
delay.
3. The retire-safety value should only be added to KSK rollovers if
there is an actual rollover happening. We consider the new DS
omnipresent a bit later, so that we are forced to keep the old DS
a bit longer.
(cherry picked from commit 63edc4435f)
While converting the kasp system test to pytest, I encountered a small
bug in the keymgr code. We retire keys when there is more than one
key matching a 'keys' line from the dnssec-policy. But if there are
multiple identical 'keys' lines, as is the case for the test zone
'checkds-doubleksk.kasp', we retire one of the two keys that have the
same properties.
Fix this by checking if there are double matches. This is not fool proof
because there may be many keys for a few identical 'keys' lines, but it
is good enough for now. In practice it makes no sense to have a policy
that dictates multiple keys with identical properties.
(cherry picked from commit ef671919d5)
If the `max-clients-per-query` option is set to a lower value than `clients-per-query`, the value is adjusted to match `clients-per-query`.
Closes#5224
Backport of MR !10241
Merge branch 'backport-5224-raise-max-clients-per-query-to-be-at-least-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!10244
In the case where 'clients-per-query' is larger than
'max-clients-per-query', raise 'max-clients-per-query' so that
'clients-per-query' equals 'max-clients-per-query' and log a warning
that this is what happened.
(cherry picked from commit f6f9645ed1)
The new intended behavior is that 'max-clients-per-query' value is
raised to equal 'clients-per-query' if it is lower.
(cherry picked from commit f50753f303)
Raw integer pointers were being used for the validator's nvalidations
and nfails values but the memory holding them could be freed before
they ceased to be used. Use reference counted counters instead.
Closes#5239
Backport of MR !10248
Merge branch 'backport-5239-use-counter-for-nvalidations-and-nfailss-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!10300
The fetch context that held these values could be freed while there
were still active pointers to the memory. Using a reference counted
pointer avoids this.
(cherry picked from commit bfbaacc9a0)
When query responses timed out, the resolver could incorrectly increase the regular responses counters, even if no response was received. This has been fixed.
Closes#5193
Backport of MR !10227
Merge branch 'backport-5193-resolver-statistics-counters-fix-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!10287
When a response times out the fctx_cancelquery() function
incorrectly calculates it in the 'dns_resstatscounter_queryrtt5'
counter (i.e. >=1600 ms). To avoid this, the rctx_timedout()
function should make sure that 'rctx->finish' is NULL. And in order
to adjust the RTT values for the timed out server, 'rctx->no_response'
should be true. Update the rctx_timedout() function to make those
changes.
(cherry picked from commit 830e548111)
The resquery_response() function increases the response counter without
checking if the response was successful. Increase the counter only when
the result indicates success.
(cherry picked from commit 12e7dfa397)
This branch started off as `michal/upforwd-asyncserver`. It quickly
turned out that the critical `asyncserver.py` change that was needed for
the `upforwd` system test was for the server to be able to read multiple
TCP queries on a single connection. As currently present in `main`,
`asyncserver.py` closes every client connection after servicing a single
query. Retaining that behavior would cause the `upforwd` system test to
fail and, in general, capturing all data sent by a client seems more
useful in tests than just closing connections quickly. `asyncserver.py`
can always be extended in the future (e.g. by adding a new
`ResponseAction` that the networking code would react to) to reinstate
the original behavior, if it turns out to be necessary.
While working on changing that particular `asyncserver.py` behavior, I
noticed a couple of other deficiencies in the TCP connection handling
code, so I started addressing them. One thing led to another and before
I noticed, enough changes were applied to be worth doing a separate
merge request, particularly given that the actual rewrite of
`upforwd/ans4/ans.pl` using `asyncserver.py` is trivial once the
required changes to `asyncserver.py` itself are applied.
Backport of MR !10276
Merge branch 'backport-michal/asyncserver-tcp-improvements-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!10284
Instead of closing every incoming TCP connection after handling a single
query, continue receiving queries on each TCP connection until the
client disconnects itself. When coupled with response dropping, this
enables silently receiving all incoming data, simulating an unresponsive
server.
(cherry picked from commit 575a874582)
A TCP DNS client may send its queries in chunks, causing
StreamReader.read() to return less data than previously declared by the
client as the DNS message length; even the two-octet DNS message length
itself may be split up into two single-octet transmissions. Sending
data in chunks is valid client behavior that should not be treated as an
error. Add a new helper method for reading TCP data in a loop, properly
distinguishing between chunked queries and client disconnections. Use
the new method for reading all TCP data from clients.
(cherry picked from commit 68fe9a5df5)
Emit more log messages from TCP connection handling code and extend
existing ones to improve debuggability of servers using asyncserver.py.
(cherry picked from commit 8c3f673f37)
A TCP peer may reset the connection at any point, but asyncserver.py
currently only handles connection resets when it is sending data to the
client. Handle connection resets during reading in the same way.
(cherry picked from commit 748ed4259b)
Prevent premature client disconnections during reading from triggering
unhandled exceptions in TCP connection handling code.
(cherry picked from commit e4c3186a7c)
Add a helper class, Peer, which holds the <host, port> tuple of a
connection endpoint and gets pretty-printed when formatted as a string.
This enables passing instances of this new class directly to logging
functions, eliminating the need for the AsyncDnsServer._format_peer()
helper method.
(cherry picked from commit 5764a9d660)
The false positive rate is about 10-20 % when evaluating shotgun results
from a single run. Attempt to reduce the false positive rate by allowing
a re-run of failed jobs.
Backport of MR !10271
Merge branch 'backport-nicki/ci-shotgun-reduce-false-positives-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!10279
The false positive rate is about 10-20 % when evaluating shotgun results
from a single run. Attempt to reduce the false positive rate by allowing
a re-run of failed jobs.
While there is a slight risk that barely noticable decreases in
performance might slip by more easily in MRs, they'd still likely pop up
during nightly or pre-release testing.
Also increase the tolerance threshold for DoH latency comparisons, as
those tests often experience increased jitter in the tail end latencies.
(cherry picked from commit 5eab352478)
Replace custom DNS servers used in the "qmin" system test with new code
based on the isctest.asyncserver module. The revised code employs zone
files and a limited amount of custom logic, which massively improves
test readability and maintainability, extends logging, and fixes
non-compliant replies sent by some of the custom servers in response to
certain queries (e.g. AA=0 in authoritative empty non-terminal
responses, non-glue address records in ADDITIONAL section).
Backport of MR !10195
Merge branch 'backport-michal/qmin-asyncserver-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!10275
The vulture tool seems to be unable to follow how the parent classes
defined in bin/tests/system/qmin/qmin_ans.py use mandatory properties
specified by child classes in bin/tests/system/qmin/ans*/ans.py. Make
the tool ignore not just ans.py servers, but also *_ans.py utility
modules above the ansX/ subdirectories to prevent false positives about
unused code from causing CI pipeline failures.
(cherry picked from commit dfd37918d6)
Some versions of the Hypothesis Python library - notably the one
included in stock OS repositories for Ubuntu 20.04 Focal Fossa - cause a
.hypothesis file to be created in a Python script's working directory
when the hypothesis module is present in its import chain. Ignore such
files by adding them to the list of expected test artifacts to prevent
pytest teardown checks from failing due to these files appearing in the
file system after running system tests.
(cherry picked from commit f413ddbe5f)
Commit 6c010a5644 caused the PYTHONPATH
environment variable to be set for ans.py servers started using
start.pl. However, no system test has actually used the new
isctest.asyncserver module since that change was applied, so it has not
been noticed until now that including the source directory in PYTHONPATH
is only sufficient for in-tree builds. Include the build directory
instead of the source directory in the PYTHONPATH environment variable
set for ans.py servers started by start.pl so that they work correctly
for both in-tree and out-of-tree builds.
(cherry picked from commit a799dd04ad)
Replace custom DNS servers used in the "qmin" system test with new code
based on the isctest.asyncserver module. The revised code employs zone
files and a limited amount of custom logic, which massively improves
test readability and maintainability, extends logging, and fixes
non-compliant replies sent by some of the custom servers in response to
certain queries (e.g. AA=0 in authoritative empty non-terminal
responses, non-glue address records in ADDITIONAL section).
(cherry picked from commit 7faa34c6ee)
When `-T cookiealwaysvalid` is passed to `named`, DNS cookie checks for
the incoming queries always pass, given they are structurally correct.
Backport of MR !10232
Merge branch 'backport-aram/new-named-minus-T-option-of-cookiealwaysvalid-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!10264
When -T cookiealwaysvalid is passed to named, DNS cookie checks for
the incoming queries always pass, given they are structurally correct.
(cherry picked from commit 807ef8545d)
A performance improvement for finding the closest encloser when generating authoritative responses from NSEC3 zones was previously reverted after a bug was found that could trigger an assertion failure. (See #4460, #4950, and #5108 for details.) The bug has now been fixed, and the performance improvement has been restored.
Fixes#5204
Backport of MR !9610
Backport of MR !9928
Merge branch '5108-nsec3-empty-node-bind-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!10034
there was a database bug in which dns_db_find() could get a partial
match for the query name, but still set foundname to match the full
query name. this triggered an assertion when query_addwildcardproof()
assumed that foundname would be shorter.
the database bug has been fixed, but in case it happens again, we
can just copy the name instead of splitting it. we will also log a
warning that the closest-encloser name was invalid.
when adding a new NSEC3 record, dns_nsec3_addnsec3() uses a
dbiterator to seek to the newly created node and then find its
predecessor. dbiterators in the qpzone use snapshots, so changes
to the database are not reflected in an already-existing iterator.
consequently, when we add a new node, we have to create a new iterator
before we can seek to it.
this test adds a record with empty non-terminal nodes above it. this
has also been observed to trigger the crash in NSEC3 zones.
NOTE: the test currently fails, because while there is no crash, the
query results are not as expected. when we add a node below an ENT,
receive_secure_serial() gets DNS_R_PARTIALMATCH, and the signed
zone is never updated. this is not a regression from fixing the
crash bug; it's a separate inline-signing bug.
test that there's no crash when querying for a newly-deleted node.
(incidentally also renamed ns3/named.conf.in to ns3/named1.conf.in,
because named2.conf.in does exist, and they should match.)
when an empty node was found, the result was treated as a partial match,
but foundname could still contain the name of the empty node instead of
its parent.
when a requested name is found in the QP trie during a lookup, but its
records have been marked as nonexistent by a previous deletion, then
it's treated as a partial match, but the foundname could be left
pointing to the original qname rather than the parent. this could
lead to an assertion failure in query_findclosestnsec3().
maxlabels is the suffix length that corresponds to the latest
NXDOMAIN response. minlabels is the suffix length that corresponds
to longest found existing name.
(cherry picked from commit 67f31c5046)
Add missing locks in dns_zone_getxfrsource4 et al. Addresses CID 468706, 468708, 468741, 468742, 468785, and 468778.
Cleanup dns_zone_setxfrsource4 et al to now return void.
Remove double copies with dns_zone_getprimaryaddr and dns_zone_getsourceaddr.
Closes#4933
Backport of MR !9485
Merge branch 'backport-4933-add-missing-locks-when-returning-addresses-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!10259
Add missing locks in dns_zone_getxfrsource4 et al. Addresses CID
468706, 468708, 468741, 468742, 468785 and 468778.
Cleanup dns_zone_setxfrsource4 et al to now return void.
Remove double copies with dns_zone_getprimaryaddr and dns_zone_getsourceaddr.
(cherry picked from commit d0a59277fb)
The `I:checking that lifting the limit will allow everything to get
cached (20)` test was failing due to the TTL of the records being
too short for the elapsed time of the test. Raise the TTL to fix
this and adjust other tests as needed.
Closes#5206
Backport of MR !10177
Merge branch 'backport-5206-tune-last-sub-test-of-reclimit-9.20' into 'bind-9.20'
See merge request isc-projects/bind9!10249