Extended DNS error mechanism (EDE) enables to have several EDE raised
during a DNS resolution (typically, a DNSSEC query will do multiple
fetches which each of them can have an error). Add support to up to 3
EDE errors in an DNS response. If duplicates occur (two EDEs with the
same code, the extra text is not compared), only the first one will be
part of the DNS answer.
Because the maximum number of EDE is statically fixed, `ns_client_t`
object own a static vector of `DNS_DE_MAX_ERRORS` (instead of a linked
list, for instance). The array can be fully filled (all slots point to
an allocated `dns_ednsopt_t` object) or partially filled (or
empty). In such case, the first NULL slot means there is no more EDE
objects.
there was a database bug in which dns_db_find() could get a partial
match for the query name, but still set foundname to match the full
query name. this triggered an assertion when query_addwildcardproof()
assumed that foundname would be shorter.
the database bug has been fixed, but in case it happens again, we
can just copy the name instead of splitting it. we will also log a
warning that the closest-encloser name was invalid.
this commit removes the deprecated "sortlist" option. the option
is now marked as ancient; it is a fatal error to use it in
named.conf.
the sortlist system test has been removed, and other tests that
referenced the option have been modified.
the enabling functions, dns_message_setsortorder() and
dns_rdataset_towiresorted(), have also been removed.
Previously, the update policy rules check was moved earlier in the
sequence, and the keep rule match pointers were kept to maintain the
ability to verify maximum records by type.
However, these pointers can become invalid if server reloading
or reconfiguration occurs before update completion. To prevent
this issue, extract the maximum records by type value immediately
during processing and only keep the copy of the values instead of the
full ssurule.
Add support for Extended DNS Errors (EDE) error 22: No reachable
authority. This occurs when after a timeout delay when the resolver is
trying to query an authority server.
Instead of cleaning the dns_badcache opportunistically, add per-loop
LRU, so each thread-loop can clean the expired entries. This also
allows removal of the atomic operations as the badcache entries are now
immutable, instead of updating the badcache entry in place, the old
entry is now deleted from the hashtable and the LRU list, and the new
entry is inserted in the LRU.
The code that listens on individual interfaces is now stable and doesn't
require any changes. The code that would bind to IPv6 wildcard address
and then use IPv6 pktinfo structure to get the source address is not
going to be completed, so it's better to just remove the dead cruft.
Add query counters for DoT, DoH, unencrypted DoH and their proxied
counterparts. The protocols don't increment TCP/UDP counters anymore
since they aren't the same as plain DNS-over-53.
QPDB is now a default implementation for both cache and zone. Remove
the venerable RBTDB database implementation, so we can fast-track the
changes to the database without having to implement the design changes
to both QPDB and RBTDB and this allows us to be more aggressive when
refactoring the database design.
RFC 9567 section 8.1 specifies that the agent domain cannot
be a subdomain of the domain it is reporting on. therefore,
in addition to making it illegal to configure that at the
zone level, we also need to disable send-report-channel for
any zone for which the global send-report-channel value is
a subdomain.
we also now warn if send-report-channel is configured
globally to a zone that we host, but that zone doesn't
have log-report-channel set.
the logging of error-report queries is no longer activated by
the view's "send-report-channel" option; that now only configures
the agent-domain value that is to be sent in authoritative
responses. the warning that was logged when "send-agent-domain"
was set to a value that is not a locally configured zone has
been removed.
error-report logging is now activated by the presence of an
authoritative zone with the "log-report-channel" option set to
"yes". this is not permitted in the root zone.
NOTE: a zone with "log-report-channel yes;" should contain a
"*._er" wildcard, but that requirement is not yet enforced.
If send-report-channel is set at the zone level, it will
be stored in the zone object and used instead of the
view-level agent-domain when constructing the EDNS
Report-Channel option.
This commit adds support for the EDNS Report-Channel option,
which is returned in authoritative responses when EDNS is in use.
"send-report-channel" sets the Agent-Domain value that will be
included in EDNS Report-Channel options. This is configurable at
the options/view level; the value is a DNS name. Setting the
Agent-Domain to the root zone (".") disables the option.
When this value has been set, incoming queries matchng the form
_er.<qtype>.<qname>.<extended-error-code>._er.<agent-domain>/TXT
will be logged to the dns-reporting-agent channel at INFO level.
(Note: error reporting queries will only be accepted if sent via
TCP or with a good server cookie. If neither is present, named
returns BADCOOKIE to complete the DNS COOKIE handshake, or TC=1
to switch the client to TCP.)
maxlabels is the suffix length that corresponds to the latest
NXDOMAIN response. minlabels is the suffix length that corresponds
to longest found existing name.
Rename check_recursionquota() to acquire_recursionquota(), and
implement a new function called release_recursionquota() to
reverse the action. It helps with decreasing code duplication.
In two places, after linking the client to the manager's
"recursing-clients" list using the check_recursionquota()
function, the query.c module fails to unlink it on error
paths. Fix the bugs by unlinking the client from the list.
Also make sure that unlinking happens before detaching the
client's handle, as it is the logically correct order, e.g.
in case if it's the last handle and ns__client_reset_cb()
can be called because of the detachment.
The 'nodetach' member is a leftover from the times when non-zero
'stale-answer-client-timeout' values were supported, and currently
is always 'false'. Clean up the member and its usage.
Query and response log shares the same flags. Move flags logging out of
log_query to share it with log_response. Use buffer instead of snprintf
to fill flags a bit faster.
Signed-off-by: Petr Menšík <pemensik@redhat.com>
Remove answer flag from log, log instead count of records for each
message section. Include EDNS version and few flags of response. Add
also status of result.
Still does not include body of responses rrset.
DNSRPS was the API for a commercial implementation of Response-Policy
Zones that was supposedly better. However, it was never open-sourced
and has only ever been available from a single vendor. This goes against
the principle that the open-source edition of BIND 9 should contain only
features that are generally available and universal.
This commit removes the DNSRPS implementation from BIND 9. It may be
reinstated in the subscription edition if there's enough interest from
customers, but it would have to be rewritten as a plugin (hook) instead
of hard-wiring it again in so many places.
Return partial match from dns_db_find/dns_db_find when requested
to short circuit the closest encloser discover process. Most of the
time this will be the actual closest encloser but may not be when
there yet to be committed / cleaned up versions of the zone with
names below the actual closest encloser.
Remove the complicated mechanism that could be (in theory) used by
external libraries to register new categories and modules with
statically defined lists in <isc/log.h>. This is similar to what we
have done for <isc/result.h> result codes. All the libraries are now
internal to BIND 9, so we don't need to provide a mechanism to register
extra categories and modules.
Add isc_logconfig_get() function to get the current logconfig and use
the getter to replace most of the little dancing around setting up
logging in the tools. Thus:
isc_log_create(mctx, &lctx, &logconfig);
isc_log_setcontext(lctx);
dns_log_setcontext(lctx);
...
...use lcfg...
...
isc_log_destroy();
is now only:
logconfig = isc_logconfig_get(lctx);
...use lcfg...
For thread-safety, isc_logconfig_get() should be surrounded by RCU read
lock, but since we never use isc_logconfig_get() in threaded context,
the only place where it is actually used (but not really needed) is
named_log_init().
MAX_RESTARTS is no longer hard-coded; ns_server_setmaxrestarts()
and dns_client_setmaxrestarts() can now be used to modify the
max-restarts value at runtime. in both cases, the default is 11.
the number of steps that can be followed in a CNAME chain
before terminating the lookup has been reduced from 16 to 11.
(this is a hard-coded value, but will be made configurable later.)
dns_difftuple_create() could only return success, so change
its type to void and clean up all the calls to it.
other functions that only returned a result value because of it
have been cleaned up in the same way.
When automatic-interface-scan is disabled, the route socket was still
being opened. Add new API to connect / disconnect from the route socket
only as needed.
Additionally, move the block that disables periodic interface rescans to
a place where it actually have access to the configuration values.
Previously, the values were being checked before the configuration was
loaded.
Due to the maximum query restart limitation a long CNAME chain
it is cut after 16 queries but named still returns NOERROR.
Return SERVFAIL instead and the partial answer.
When sending fails, the ns__client_request() would not reset the
connection and continue as nothing is happening. This comes from the
model that we don't care about failed UDP sends because datagrams are
unreliable anyway, but it greatly affects TCP connections with
keep-alive.
The worst case scenario is as follows:
1. the 3-way TCP handshake gets completed
2. the libuv calls the "uv_connection_cb" callback
3. the TCP connection gets queue because of the tcp-clients quota
4. the TCP client sends as many DNS messages as the buffers allow
5. the TCP connection gets dropped by the client due to the timeout
6. the TCP connection gets accepted by the server
7. the data already sent by the client gets read
8. all sending fails immediately because the TCP connection is dead
9. we consume all the data in the buffer in a very tight loop
As it doesn't make sense to trying to process more data on the TCP
connection when the sending is failing, drop the connection immediately
on the first sending error.
As ns_query_init() cannot fail now, remove the error paths, especially
in ns__client_setup() where we now don't have to care what to do with
the connection if setting up the client could fail. It couldn't fail
even before, but now it's formal.
Clear qctx->zversion when clearing qctx->zrdataset et al in
lib/ns/query.c:qctx_freedata. The uncleared pointer could lead to
an assertion failure if zone data needed to be re-saved which could
happen with stale data support enabled.