If we did not attempt a fetch due to fetch-limits, we should not start
the stale-refresh-time window.
Introduce a new flag DNS_DBFIND_STALESTART to differentiate between
a resolver failure and unexpected error. If we are resuming, this
indicates a resolver failure, then start the stale-refresh-time window,
otherwise don't start the stale-refresh-time window, but still fall
back to using stale data.
(This commit also wraps some docstrings to 80 characters width)
Before this change, BIND will only fallback to using stale data if
there was an actual attempt to resolve the query. Then on a timeout,
the stale data from cache becomes eligible.
This commit changes this so that on any unexpected error stale data
becomes eligble (you would still have to have 'stale-answer-enable'
enabled of course).
If there is no stale data, this may return in an error again, so don't
loop on stale data lookup attempts. If the DNS_DBFIND_STALEOK flag is
set, this means we already tried to lookup stale data, so if that is
the case, don't use stale again.
First of all, there was a flaw in the code related to the
'stale-refresh-time' option. If stale answers are enabled, and we
returned stale data, then it was assumed that it was because we were
in the 'stale-refresh-time' window. But now we could also have returned
stale data because of a 'stale-answer-client-timeout'. To fix this,
introduce a rdataset attribute DNS_RDATASETATTR_STALE_WINDOW to
indicate whether the stale cache entry was returned because the
'stale-refresh-time' window is active.
Second, remove the special case handling when the result is
DNS_R_NCACHENXRRSET. This can be done more generic in the code block
when dealing with stale data.
Putting all stale case handling in the code block when dealing with
stale data makes the code more easy to follow.
Update documentation to be more verbose and to match then new code
flow.
Both functions employed the same code lines to allocate query context
buffers, which are used to store query results, so this shared portion
of code was extracted out to a new function, qctx_prepare_buffers.
Also, this commit uses qctx_init to initialize the query context whitin
query_refresh_rrset function.
This commit allows stale RRset to be used (if available) for responding
a query, before an attempt to refresh an expired, or otherwise resolve
an unavailable RRset in cache is made.
For that to work, a value of zero must be specified for
stale-answer-client-timeout statement.
To better understand the logic implemented, there are three flags being
used during database lookup and other parts of code that must be
understood:
. DNS_DBFIND_STALEOK: This flag is set when BIND fails to refresh a
RRset due to timeout (resolver-query-timeout), its intent is to
try to look for stale data in cache as a fallback, but only if
stale answers are enabled in configuration.
This flag is also used to activate stale-refresh-time window, since it
is the only way the database knows that a resolution has failed.
. DNS_DBFIND_STALEENABLED: This flag is used as a hint to the database
that it may use stale data. It is always set during query lookup if
stale answers are enabled, but only effectively used during
stale-refresh-time window. Also during this window, the resolver will
not try to resolve the query, in other words no attempt to refresh the
data in cache is made when the stale-refresh-time window is active.
. DNS_DBFIND_STALEONLY: This new introduced flag is used when we want
stale data from the database, but not due to a failure in resolution,
it also doesn't require stale-refresh-time window timer to be active.
As long as there is a stale RRset available, it should be returned.
It is mainly used in two situations:
1. When stale-answer-client-timeout timer is triggered: in that case
we want to know if there is stale data available to answer the
client.
2. When stale-answer-client-timeout value is set to zero: in that
case, we also want to know if there is some stale RRset available
to promptly answer the client.
We must also discern between three situations that may happen when
resolving a query after the addition of stale-answer-client-timeout
statement, and how to handle them:
1. Are we running query_lookup() due to stale-answer-client-timeout
timer being triggered?
In this case, we look for stale data, making use of
DNS_DBFIND_STALEONLY flag. If a stale RRset is available then
respond the client with the data found, mark this query as
answered (query attribute NS_QUERYATTR_ANSWERED), so when the
fetch completes the client won't be answered twice.
We must also take care of not detaching from the client, as a
fetch will still be running in background, this is handled by the
following snippet:
if (!QUERY_STALEONLY(&client->query)) {
isc_nmhandle_detach(&client->reqhandle);
}
Which basically tests if DNS_DBFIND_STALEONLY flag is set, which
means we are here due to a stale-answer-client-timeout timer
expiration.
2. Are we running query_lookup() due to resolver-query-timeout being
triggered?
In this case, DNS_DBFIND_STALEOK flag will be set and an attempt
to look for stale data will be made.
As already explained, this flag is algo used to activate
stale-refresh-time window, as it means that we failed to refresh
a RRset due to timeout.
It is ok in this situation to detach from the client, as the
fetch is already completed.
3. Are we running query_lookup() during the first time, looking for
a RRset in cache and stale-answer-client-timeout value is set to
zero?
In this case, if stale answers are enabled (probably), we must do
an initial database lookup with DNS_DBFIND_STALEONLY flag set, to
indicate to the database that we want stale data.
If we find an active RRset, proceed as normal, answer the client
and the query is done.
If we find a stale RRset we respond to the client and mark the
query as answered, but don't detach from the client yet as an
attempt in refreshing the RRset will still be made by means of
the new introduced function 'query_resolve'.
If no active or stale RRset is available, begin resolution as
usual.
The general logic behind the addition of this new feature works as
folows:
When a client query arrives, the basic path (query.c / ns_query_recurse)
was to create a fetch, waiting for completion in fetch_callback.
With the introduction of stale-answer-client-timeout, a new event of
type DNS_EVENT_TRYSTALE may invoke fetch_callback, whenever stale
answers are enabled and the fetch took longer than
stale-answer-client-timeout to complete.
When an event of type DNS_EVENT_TRYSTALE triggers fetch_callback, we
must ensure that the folowing happens:
1. Setup a new query context with the sole purpose of looking up for
stale RRset only data, for that matters a new flag was added
'DNS_DBFIND_STALEONLY' used in database lookups.
. If a stale RRset is found, mark the original client query as
answered (with a new query attribute named NS_QUERYATTR_ANSWERED),
so when the fetch completion event is received later, we avoid
answering the client twice.
. If a stale RRset is not found, cleanup and wait for the normal
fetch completion event.
2. In ns_query_done, we must change this part:
/*
* If we're recursing then just return; the query will
* resume when recursion ends.
*/
if (RECURSING(qctx->client)) {
return (qctx->result);
}
To this:
if (RECURSING(qctx->client) && !QUERY_STALEONLY(qctx->client)) {
return (qctx->result);
}
Otherwise we would not proceed to answer the client if it happened
that a stale answer was found when looking up for stale only data.
When an event of type DNS_EVENT_FETCHDONE triggers fetch_callback, we
proceed as before, resuming query, updating stats, etc, but a few
exceptions had to be added, most important of which are two:
1. Before answering the client (ns_client_send), check if the query
wasn't already answered before.
2. Before detaching a client, e.g.
isc_nmhandle_detach(&client->reqhandle), ensure that this is the
fetch completion event, and not the one triggered due to
stale-answer-client-timeout, so a correct call would be:
if (!QUERY_STALEONLY(client)) {
isc_nmhandle_detach(&client->reqhandle);
}
Other than these notes, comments were added in code in attempt to make
these updates easier to follow.
These options were ancient or made obsolete a long time ago, it is
safe to remove them.
Also stop printing ancient options, they should be treated the same as
unknown options.
Removed options: lwres, geoip-use-ecs, sit-secret, use-ixfr,
acache-cleaning-interval, acache-enable, additional-from-auth,
additional-from-cache, allow-v6-synthesis, dnssec-enable,
max-acache-size, nosit-udp-size, queryport-pool-ports,
queryport-pool-updateinterval, request-sit, use-queryport-pool, and
support-ixfr.
Return value of dns_db_getservestalerefresh() and
dns_db_getservestalettl() functions were previously unhandled.
This commit purposefully ignore those return values since there is
no side effect if those results are != ISC_R_SUCCESS, it also supress
Coverity warnings.
previously query plugins were strictly synchrounous - the query
process would be interrupted at some point, data would be looked
up or a change would be made, and then the query processing would
resume immediately.
this commit enables query plugins to initiate asynchronous processes
and resume on a completion event, as with recursion.
several small changes to query processing to make it easier to
use hook-based recursion (and other asynchronous functionlity)
later.
- recursion quota check is now a separate function,
check_recursionquota(), which is called by ns_query_recurse().
- pass isc_result to query_nxdomain() instead of bool.
the value of 'empty_wild' will be determined in the function
based on the passed result. this is similar to query_nodata(),
and makes the signatures of the two functions more consistent.
- pass the current 'result' value into plugin hooks.
Before this update, BIND would attempt to do a full recursive resolution
process for each query received if the requested rrset had its ttl
expired. If the resolution fails for any reason, only then BIND would
check for stale rrset in cache (if 'stale-cache-enable' and
'stale-answer-enable' is on).
The problem with this approach is that if an authoritative server is
unreachable or is failing to respond, it is very unlikely that the
problem will be fixed in the next seconds.
A better approach to improve performance in those cases, is to mark the
moment in which a resolution failed, and if new queries arrive for that
same rrset, try to respond directly from the stale cache, and do that
for a window of time configured via 'stale-refresh-time'.
Only when this interval expires we then try to do a normal refresh of
the rrset.
The logic behind this commit is as following:
- In query.c / query_gotanswer(), if the test of 'result' variable falls
to the default case, an error is assumed to have happened, and a call
to 'query_usestale()' is made to check if serving of stale rrset is
enabled in configuration.
- If serving of stale answers is enabled, a flag will be turned on in
the query context to look for stale records:
query.c:6839
qctx->client->query.dboptions |= DNS_DBFIND_STALEOK;
- A call to query_lookup() will be made again, inside it a call to
'dns_db_findext()' is made, which in turn will invoke rbdb.c /
cache_find().
- In rbtdb.c / cache_find() the important bits of this change is the
call to 'check_stale_header()', which is a function that yields true
if we should skip the stale entry, or false if we should consider it.
- In check_stale_header() we now check if the DNS_DBFIND_STALEOK option
is set, if that is the case we know that this new search for stale
records was made due to a failure in a normal resolution, so we keep
track of the time in which the failured occured in rbtdb.c:4559:
header->last_refresh_fail_ts = search->now;
- In check_stale_header(), if DNS_DBFIND_STALEOK is not set, then we
know this is a normal lookup, if the record is stale and the query
time is between last failure time + stale-refresh-time window, then
we return false so cache_find() knows it can consider this stale
rrset entry to return as a response.
The last additions are two new methods to the database interface:
- setservestale_refresh
- getservestale_refresh
Those were added so rbtdb can be aware of the value set in configuration
option, since in that level we have no access to the view object.
The typical sequence of events for AAAA queries which trigger recursion
for an A RRset at the same name is as follows:
1. Original query context is created.
2. An AAAA RRset is found in cache.
3. Client-specific data is allocated from the filter-aaaa memory pool.
4. Recursion is triggered for an A RRset.
5. Original query context is torn down.
6. Recursion for an A RRset completes.
7. A second query context is created.
8. Client-specific data is retrieved from the filter-aaaa memory pool.
9. The response to be sent is processed according to configuration.
10. The response is sent.
11. Client-specific data is returned to the filter-aaaa memory pool.
12. The second query context is torn down.
However, steps 6-12 are not executed if recursion for an A RRset is
canceled. Thus, if named is in the process of recursing for A RRsets
when a shutdown is requested, the filter-aaaa memory pool will have
outstanding allocations which will never get released. This in turn
leads to a crash since every memory pool must not have any outstanding
allocations by the time isc_mempool_destroy() is called.
Fix by creating a stub query context whenever fetch_callback() is called,
including cancellation events. When the qctx is destroyed, it will ensure
the client is detached and the plugin memory is freed.
The message buffer passed to ns__client_request is only valid for
the life of the the ns__client_request call. Save a copy of it
when we recurse or process a update as ns__client_request will
return before those operations complete.
As the query_prefetch() or query_rpzfetch() could be called during
"regular" fetch, we need to introduce separate storage for attaching
the nmhandle during prefetching the records. The query_prefetch()
and query_rpzfetch() are guarded for re-entrance by .query.prefetch
member of ns_client_t, so we can reuse the same .prefetchhandle for
both.
Attaching and detaching handle pointers will make it easier to
determine where and why reference counting errors have occurred.
A handle needs to be referenced more than once when multiple
asynchronous operations are in flight, so callers must now maintain
multiple handle pointers for each pending operation. For example,
ns_client objects now contain:
- reqhandle: held while waiting for a request callback (query,
notify, update)
- sendhandle: held while waiting for a send callback
- fetchhandle: held while waiting for a recursive fetch to
complete
- updatehandle: held while waiting for an update-forwarding
task to complete
control channel connection objects now contain:
- readhandle: held while waiting for a read callback
- sendhandle: held while waiting for a send callback
- cmdhandle: held while an rndc command is running
httpd connections contain:
- readhandle: held while waiting for a read callback
- sendhandle: held while waiting for a send callback
The basic scenario for the problem was that in the process of
resolving a query, if any rrset was eligible for prefetching, then it
would trigger a call to query_prefetch(), this call would run in
parallel to the normal query processing.
The problem arises due to the fact that both query_prefetch(), and,
in the original thread, a call to ns_query_recurse(), try to attach
to the recursionquota, but recursing client stats counter is only
incremented if ns_query_recurse() attachs to it first.
Conversely, if fetch_callback() is called before prefetch_done(),
it would not only detach from recursionquota, but also decrement
the stats counter, if query_prefetch() attached to te quota first
that would result in a decrement not matched by an increment, as
expected.
To solve this issue an atomic bool was added, it is set once in
ns_query_recurse(), allowing fetch_callback() to check for it
and decrement stats accordingly.
For a more compreensive explanation check the thread comment below:
https://gitlab.isc.org/isc-projects/bind9/-/issues/1719#note_145857
- clone keynode->dsset rather than return a pointer so that thread
use is independent of each other.
- hold a reference to the dsset (keynode) so it can't be deleted
while in use.
- create a new keynode when removing DS records so that dangling
pointers to the deleted records will not occur.
- use a rwlock when accessing the rdatalist to prevent instabilities
when DS records are added.
The rewrite of BIND 9 build system is a large work and cannot be reasonable
split into separate merge requests. Addition of the automake has a positive
effect on the readability and maintainability of the build system as it is more
declarative, it allows conditional and we are able to drop all of the custom
make code that BIND 9 developed over the years to overcome the deficiencies of
autoconf + custom Makefile.in files.
This squashed commit contains following changes:
- conversion (or rather fresh rewrite) of all Makefile.in files to Makefile.am
by using automake
- the libtool is now properly integrated with automake (the way we used it
was rather hackish as the only official way how to use libtool is via
automake
- the dynamic module loading was rewritten from a custom patchwork to libtool's
libltdl (which includes the patchwork to support module loading on different
systems internally)
- conversion of the unit test executor from kyua to automake parallel driver
- conversion of the system test executor from custom make/shell to automake
parallel driver
- The GSSAPI has been refactored, the custom SPNEGO on the basis that
all major KRB5/GSSAPI (mit-krb5, heimdal and Windows) implementations
support SPNEGO mechanism.
- The various defunct tests from bin/tests have been removed:
bin/tests/optional and bin/tests/pkcs11
- The text files generated from the MD files have been removed, the
MarkDown has been designed to be readable by both humans and computers
- The xsl header is now generated by a simple sed command instead of
perl helper
- The <irs/platform.h> header has been removed
- cleanups of configure.ac script to make it more simpler, addition of multiple
macros (there's still work to be done though)
- the tarball can now be prepared with `make dist`
- the system tests are partially able to run in oot build
Here's a list of unfinished work that needs to be completed in subsequent merge
requests:
- `make distcheck` doesn't yet work (because of system tests oot run is not yet
finished)
- documentation is not yet built, there's a different merge request with docbook
to sphinx-build rst conversion that needs to be rebased and adapted on top of
the automake
- msvc build is non functional yet and we need to decide whether we will just
cross-compile bind9 using mingw-w64 or fix the msvc build
- contributed dlz modules are not included neither in the autoconf nor automake
In case of normal fetch, the .recursionquota is attached and
ns_statscounter_recursclients is incremented when the fetch is created. Then
the .recursionquota is detached and the counter decremented in the
fetch_callback().
In case of prefetch or rpzfetch, the quota is attached, but the counter is not
incremented. When we reach the soft-quota, the function returns early but don't
detach from the quota, and it gets destroyed during the ns_client_endrequest(),
so no memory was leaked.
But because the ns_statscounter_recursclients is only incremented during the
normal fetch the counter would be incorrectly decremented on two occassions:
1) When we reached the softquota, because the quota was not properly detached
2) When the prefetch or rpzfetch was cancelled mid-flight and the callback
function was never called.
This new option was added to fill a gap in RPZ configuration
options.
It was possible to instruct BIND wheter NSIP rewritting rules would
apply or not, as long as the required data was already in cache or not,
respectively, by means of the option nsip-wait-recurse.
A value of yes (default) could incur a little processing cost, since
BIND would need to recurse to find NS addresses in case they were not in
the cache.
This behavior could be changed by setting nsip-wait-recurse value to no,
in which case BIND would promptly return some error code if the NS IP addresses
data were not in cache, then BIND would start a recursive query
in background, so future similar requests would have the required data
(NS IPs) in cache, allowing BIND to apply NSIP rules accordingly.
A similar feature wasn't available for NSDNAME triggers, so this commit
adds the option nsdname-wait-recurse to fill this gap, as it was
expected by couple BIND users.
this corrects some style glitches such as:
```
long_function_call(arg, arg2, arg3, arg4, arg5, "str"
"ing");
```
...by adjusting the penalties for breaking strings and call
parameter lists.
Both clang-tidy and uncrustify chokes on statement like this:
for (...)
if (...)
break;
This commit uses a very simple semantic patch (below) to add braces around such
statements.
Semantic patch used:
@@
statement S;
expression E;
@@
while (...)
- if (E) S
+ { if (E) { S } }
@@
statement S;
expression E;
@@
for (...;...;...)
- if (E) S
+ { if (E) { S } }
@@
statement S;
expression E;
@@
if (...)
- if (E) S
+ { if (E) { S } }
The isc_buffer_allocate() function now cannot fail with ISC_R_MEMORY.
This commit removes all the checks on the return code using the semantic
patch from previous commit, as isc_buffer_allocate() now returns void.
as initial-key and static-key trust anchors will now be stored as a
DS rrset, code referencing keynodes storing DNSKEY trust anchors will
no longer be reached.
- ns__client_request() is now called by netmgr with an isc_nmhandle_t
parameter. The handle can then be permanently associated with an
ns_client object.
- The task manager is paused so that isc_task events that may be
triggred during client processing will not fire until after the netmgr is
finished with it. Before any asynchronous event, the client MUST
call isc_nmhandle_ref(client->handle), to prevent the client from
being reset and reused while waiting for an event to process. When
the asynchronous event is complete, isc_nmhandle_unref(client->handle)
must be called to ensure the handle can be reused later.
- reference counting of client objects is now handled in the nmhandle
object. when the handle references drop to zero, the client's "reset"
callback is used to free temporary resources and reiniialize it,
whereupon the handle (and associated client) is placed in the
"inactive handles" queue. when the sysstem is shutdown and the
handles are cleaned up, the client's "put" callback is called to free
all remaining resources.
- because client allocation is no longer handled in the same way,
the '-T clienttest' option has now been removed and is no longer
used by any system tests.
- the unit tests require wrapping the isc_nmhandle_unref() function;
when LD_WRAP is supported, that is used. otherwise we link a
libwrap.so interposer library and use that.
This commit was done by hand to add the RUNTIME_CHECK() around stray
dns_name_copy() calls with NULL as third argument. This covers the edge cases
that doesn't make sense to write a semantic patch since the usage pattern was
unique or almost unique.
This second commit uses second semantic patch to replace the calls to
dns_name_copy() with NULL as third argument where the result was stored in a
isc_result_t variable. As the dns_name_copy(..., NULL) cannot fail gracefully
when the third argument is NULL, it was just a bunch of dead code.
Couple of manual tweaks (removing dead labels and unused variables) were
manually applied on top of the semantic patch.
This commit add RUNTIME_CHECK() around all simple dns_name_copy() calls where
the third argument is NULL using the semantic patch from the previous commit.
For some libc implementations, BUFSIZ is small enough (e.g. 1024 for
musl libc) to trigger compilation warnings about insufficient size of
certain buffers. Since the relevant buffers are used for printing DNS
names, increase their size to '(n + 1) * DNS_NAME_FORMATSIZE', where 'n'
is the number of DNS names which are printed to a given buffer. This
results in somewhat arbitrary, albeit nicely-aligned and large enough
buffer sizes.
- when processing authoritative queries for ./NS, set 'gluedb' so
that glue will be included in the response, regardless of how
'minimal-responses' has been configured.
We increase recursclients when we attach to recursion quota,
decrease when we detach. In some cases, when we hit soft
quota, we might attach to quota without increasing recursclients
gauge. We then decrease the gauge when we detach from quota,
and it causes the statistics to underflow.
Fix makes sure that we increase recursclients always when we
succesfully attach to recursion quota.
qname minimization, even in relaxed mode, can fail on
some very broken domains. In relaxed mode, instead of
asking for "foo.bar NS" ask for "_.foo.bar A" to either
get a delegation or NXDOMAIN. It will require more queries
than regular mode for proper NXDOMAINs.
- Always set is_zonep in query_getdb; previously it was only set if
result was ISC_R_SUCCESS or ISC_R_NOTFOUND.
- Don't reset is_zone for redirect.
- Style cleanup.
(cherry picked from commit a85cc641d7a4c66cbde03cc4e31edc038a24df46)
(cherry picked from commit 486a201149)
- named could return FORMERR if parsing iterative responses
ended with a result code such as DNS_R_OPTERR. instead of
computing a response code based on the result, in this case
we now just force the response to be SERVFAIL.
this restores functionality that was removed in commit 03be5a6b4e,
allowing named to search in authoritative zone databases outside the
current zone for additional data, if and only if recursion is allowed
and minimal-responses is disabled.
in query_respond_any(), the assumption had previously been made that it
was impossible to get past iterating the node with a return value of
ISC_R_NOMORE but not have found any records, unless we were searching
for RRSIG or SIG. however, it is possible for other types to exist but
be hidden, such as when the zone is transitioning from insecure to
secure and DNSSEC types are encountered, and this situation could
trigger an assertion. removed the assertion and reorganized the code.
This change makes rndc dumpdb correctly print the "; stale" line.
It also provides extra information on how long this data may still
be served to clients (in other words how long the stale RRset may
still be used).
- "hook" is now used only for hook points and hook actions
- the "hook" statement in named.conf is now "plugin"
- ns_module and ns_modlist are now ns_plugin and ns_plugins
- ns_module_load is renamed ns_plugin_register
- the mandatory functions in plugin modules (hook_register,
hook_check, hook_version, hook_destroy) have been renamed
- added some hook points that will be needed for a dns64 module later
- moved some code from the beginning of query_respond() to
the end of query_prepresponse(); this has no effect on functionality
but means we can have a hook point at the top of query_respond(),
which seems nicer
- compressed duplicated code into query_zerottl_refetch() function
- added a qctx->answered flag so that a module can prevent
query_addrrset() from being called from query_respond() when
it's already been called from the module.
- this is necessary because adding the same hook to multiple views
causes the ISC_LIST link value to become inconsistent; it isn't
noticeable when only one hook action is ever registered at a
given hook point, but it will break things when there are two.
- eliminate qctx->hookdata and client->hookflags.
- use a memory pool to allocate data blobs in the filter-aaaa module,
and associate them with the client address in a hash table
- instead of detaching the client in query_done(), mark it for deletion
and then call ns_client_detach() from qctx_destroy(); this ensures
that it will still exist when the QCTX_DESTROYED hook point is
reached.
- use a get_hooktab() function to determine the hook table.
- PROCESS_HOOK now jumps to a cleanup tag on failure
- add PROCESS_ALL_HOOKS in query.c, to run all hook functions at
a specified hook point without stopping. this is to be used for
intiialization and destruction functions that must run in every
module.
- 'result' is set in PROCESS_HOOK only when a hook function
interrupts processing.
- revised terminology: a "callback" is now a "hook action"
- remove unused NS_PROCESS_HOOK and NS_PROCESS_HOOK_VOID macros.
- added a 'hookdata' array to qctx to store pointers to up to
16 blobs of data which are allocated by modules as needed.
each module is assigned an ID number as it's loaded, and this
is the index into the hook data array. this is to be used for
holding persistent state between calls to a hook module for a
specific query.
- instead of using qctx->filter_aaaa, we now use qctx->hookdata.
(this was the last piece of filter-aaaa specific code outside the
module.)
- added hook points for qctx initialization and destruction. we get
a filter-aaaa data pointer from the mempool when initializing and
store it in the qctx->hookdata table; return to to the mempool
when destroying the qctx.
- link the view to the qctx so that detaching the client doesn't cause
hooks to fail
- added a qctx_destroy() function which must be called after qctx_init;
this calls the QCTX_DESTROY hook and detaches the view
- general cleanup and comments
- make some cfg-parsing functions global so they can be run
from filter-aaaa.so
- add filter-aaaa options to the hook module's parser
- mark filter-aaaa options in named.conf as obsolete, remove
from named and checkconf, and update the filter-aaaa test not to
use checkconf anymore
- remove filter-aaaa-related struct members from dns_view
- allow multiple "hook" statements at global or view level
- add "optional bracketed text" type for optional parameter list
- load hook module from specified path rather than hardcoded path
- add a hooktable pointer (and a callback for freeing it) to the
view structure
- change the hooktable functions so they no longer update ns__hook_table
by default, and modify PROCESS_HOOK so it uses the view hooktable, if
set, rather than ns__hook_table. (ns__hook_table is retained for
use by unit tests.)
- update the filter-aaaa system test to load filter-aaaa.so
- add a prereq script to check for dlopen support before running
the filter-aaaa system test
not yet done:
- configuration parameters are not being passed to the filter-aaaa
module; the filter-aaaa ACL and filter-aaaa-on-{v4,v6} settings are
still stored in dns_view
- temporary kluge! in this version, for testing purposes,
named always searches for a filter-aaaa module at /tmp/filter-aaaa.so.
this enables the filter-aaaa system test to run even though the
code to configure hooks in named.conf hasn't been written yet.
- filter-aaaa-on-v4, filter-aaaa-on-v6 and the filter-aaaa ACL are
still configured in the view as they were before, not in the hook.
- these formerly static helper functions have been moved into client.c
and made external so that they can be used in hook modules as well as
internally in libns: query_newrdataset, query_putrdataset,
query_newnamebuf, query_newname, query_getnamebuf, query_keepname,
query_releasename, query_newdbversion, query_findversion
- made query_recurse() and query_done() into public functions
ns_query_recurse() and ns_query_done() so they can be called from
modules.
- the goal of this change is for AAAA filtering to be fully contained
in the query logic, and implemented at discrete points that can be
replaced with hook callouts later on.
- the new code may be slightly less efficient than the old filter-aaaa
implementation, but maximum efficiency was never a priority for AAAA
filtering anyway.
- we now use the rdataset RENDERED attribute to indicate that an AAAA
rdataset should not be included when rendering the message. (this
flag was originally meant to indicate that an rdataset has already
been rendered and should not be repeated, but it can also be used to
prevent rendering in the first place.)
- the DNS_MESSAGERENDER_FILTER_AAAA, NS_CLIENTATTR_FILTER_AAAA,
and DNS_RDATASETGLUE_FILTERAAAA flags are all now unnecessary and
have been removed.
- the purpose of this change is allow for more well-defined hook points
to be available in the query processing logic. some functions that
formerly didn't have access to 'qctx' do now; this is needed because
'qctx' is what gets passed when calling a hook function.
- query_addrdataset() has been broken up into three separate functions
since it used to do three unrelated things, and what was formerly
query_addadditional() has been renamed query_additional_cb() for
clarity.
- client->filter_aaaa is now qctx->filter_aaaa. (later, it will be moved
into opaque storage in the qctx, for use by the filter-aaaa module.)
- cleaned up style and braces
- move hooks.h to public include directory
- ns_hooktable_init() initializes a hook table. if NULL is passed in, it
initializes the global hook table
- ns_hooktable_save() saves a pointer to the current global hook table.
- ns_hooktable_reset() replaces the global hook table with different
one
- ns_hook_add() adds hooks at specified hook points in a hook table (or
the global hook table if the specified table is NULL)
- load and unload functions support dlopen() of hook modules (this is
adapted from dyndb and not yet functional)
- began adding new hook points to query.c
While implementing the new unit testing framework cmocka, it was found that the
BIND 9 code doesn't compile when assertions are disabled or replaced with any
function (such as mock_assert() from cmocka unit testing framework) that's not
directly recognized as assertion by the compiler.
This made the compiler to complain about blocks of code that was recognized as
unreachable before, but now it isn't.
The changes in this commit include:
* assigns default values to couple of local variables,
* moves some return statements around INSIST assertions,
* adds __builtin_unreachable(); annotations after some INSIST assertions,
* fixes one broken assertion (= instead of ==)
Use a zone's 'type' field instead of the value of its DNS_ZONEOPT_MIRROR
option for checking whether it is a mirror zone. This makes said zone
option and its associated helper function, dns_zone_mirror(), redundant,
so remove them. Remove a check specific to mirror zones from
named_zone_reusable() since another check in that function ensures that
changing a zone's type prevents it from being reused during
reconfiguration.
Rather than overloading dns_zone_slave and discerning between a slave
zone and a mirror zone using a zone option, define a separate enum
value, dns_zone_mirror, to be used exclusively by mirror zones. Update
code handling slave zones to ensure it also handles mirror zones where
applicable.
Commit ba91243542 causes the resolver to
respond to a client query with FORMERR when all upstream queries sent to
the servers authoritative for QNAME elicit FORMERR responses. This
happens because resolver code returns DNS_R_FORMERR in such a case and
dns_result_torcode() acts as a pass-through for all arguments which are
already a valid RCODE.
The correct RCODE to set in the response returned to the client in the
case described above is SERVFAIL. Make sure this happens by overriding
the RCODE in query_gotanswer(), on the grounds that any format errors in
the client query itself should be caught long before execution reaches
that point. This change should not reduce query error logging accuracy
as the resolver code itself reports the exact reason for returning a
DNS_R_FORMERR result using log_formerr().
In some cases, setting qctx->result to DNS_R_SERVFAIL causes the value
of a 'result' variable containing a more specific failure reason to be
effectively discarded. This may cause certain query error log messages
to lack specificity despite a more accurate problem cause being
determined during query processing.
In other cases, qctx->result is set to DNS_R_SERVFAIL even though a more
specific error (e.g. ISC_R_NOMEMORY) could be explicitly indicated.
Since the response message's RCODE is derived from qctx->result using
dns_result_torcode(), which handles a number of possible isc_result_t
values and returns SERVFAIL for anything not explicitly listed, it is
fine to set qctx->result to something more specific than DNS_R_SERVFAIL
(in fact, this is already being done in a few cases). Modify most
QUERY_ERROR() calls so that qctx->result is set to a more specific error
code when possible. Adjust query_error() so that statistics are still
calculated properly. Remove the RECURSE_ERROR() macro which was
introduced exactly because qctx->result could be set to DNS_R_SERVFAIL
instead of DNS_R_DUPLICATE or DNS_R_DROP, which need special handling.
Modify dns_sdlz_putrr() so that it returns DNS_R_SERVFAIL when a DLZ
driver returns invalid RDATA, in order to prevent setting RCODE to
FORMERR (which is what dns_result_torcode() translates e.g. DNS_R_SYNTAX
to) while responding authoritatively.
When something goes wrong while recursing for an answer to a query,
query_gotanswer() sets a flag (qctx->want_stale) in the query context.
query_done() is subsequently called and it can either set up a stale
response lookup (if serve-stale is enabled) or conclude that a SERVFAIL
response should be sent. This may cause confusion when looking at query
error logs since the QUERY_ERROR() line responsible for setting the
response's RCODE to SERVFAIL is not in a catch-all branch of a switch
statement inside query_gotanswer() (like it is for authoritative
responses) but rather in a code branch which appears to have something
to do with serve-stale, even when the latter is not enabled.
Extract the part of query_done() responsible for checking serve-stale
configuration and optionally setting up a stale response lookup into a
separate function, query_usestale(), shifting the responsibility for
setting the response's RCODE to SERVFAIL to the same QUERY_ERROR() line
in query_gotanswer() which is evaluated for authoritative responses.
Changes introduced by the previous two commits make the parts of
query_delegation() and query_zone_delegation() which prepare a
delegation response functionally equivalent. Extract this code into a
separate function, query_prepare_delegation_response(), and then call
the latter from both query_delegation() and query_zone_delegation() in
order to reduce code duplication. Add a comment describing the purpose
of the extracted code. Fix coding style issues.
When query processing hits a delegation from a locally configured zone,
an attempt may be made to look for a better answer in the cache. In
such a case, the zone-sourced delegation data is set aside and the
lookup is retried using the cache database. When that lookup is
completed, a decision is made whether the answer found in the cache is
better than the answer found in the zone.
Currently, if the zone-sourced answer turns out to be better than the
one found in the cache:
- qctx->zdb is not restored into qctx->db,
- qctx->node, holding the zone database node found, is not even saved.
Thus, in such a case both qctx->db and qctx->node will point at cache
data. This is not an issue for BIND versions which do not support
mirror zones because in these versions non-recursive queries always
cause the zone-sourced delegation to be returned and thus the
non-recursive part of query_delegation() is never reached if the
delegation is coming from a zone. With mirror zones, however,
non-recursive queries may cause cache lookups even after a zone
delegation is found. Leaving qctx->db assigned to the cache database
when query_delegation() determines that the zone-sourced delegation is
the best answer to the client's query prevents DS records from being
added to delegations coming from mirror zones. Fix this issue by
keeping the zone database and zone node in qctx while the cache is
searched for an answer and then restoring them into qctx->db and
qctx->node, respectively, if the zone-sourced delegation turns out to be
the best answer. Since this change means that qctx->zdb cannot be used
as the glue database any more as it will be reset to NULL by RESTORE(),
ensure that qctx->db is not a cache database before attaching it to
qctx->client->query.gluedb.
Furthermore, current code contains a conditional statement which
prevents a mirror zone from being used as a source of glue records.
Said statement was added to prevent assertion failures caused by
attempting to use a zone database's glue cache for finding glue for an
NS RRset coming from a cache database. However, that check is overly
strict since it completely prevents glue from being added to delegations
coming from mirror zones. With the changes described above in place,
the scenario this check was preventing can no longer happen, so remove
the aforementioned check.
If qctx->zdb is not NULL, qctx->zfname will also not be NULL;
qctx->zsigrdataset may be NULL in such a case, but query_putrdataset()
handles pointers to NULL pointers gracefully. Remove redundant
conditional expressions to make the cleanup code in query_freedata()
match the corresponding sequences of SAVE() / RESTORE() macros more
closely.
As mirror zone data should be treated the way validated, cached DNS
responses are, it should not be used when responding to clients who are
not allowed cache access. Reuse code responsible for determining cache
database access for evaluating mirror zone access.
Modify query_checkcacheaccess() so that it only contains a single return
statement rather than three and so that the "check_acl" variable is no
longer needed. Tweak and expand comments. Fix coding style issues.
Modify query_getcachedb() so that it uses a common return path for both
success and failure. Remove a redundant NULL check since 'db' will
never be NULL after being passed as a target pointer to dns_db_attach().
Fix coding style issues.
Extract the parts of query_getcachedb() responsible for checking whether
the client is allowed to access the cache to a separate function, so
that it can be reused for determining mirror zone access.
If transferring or loading a mirror zone fails, resolution should still
succeed by means of falling back to regular recursive queries.
Currently, though, if a slave zone is present in the zone table and not
loaded, a SERVFAIL response is generated. Thus, mirror zones need
special handling in this regard.
Add a new dns_zt_find() flag, DNS_ZTFIND_MIRROR, and set it every time a
domain name is looked up rather than a zone itself. Handle that flag in
dns_zt_find() in such a way that a mirror zone which is expired or not
yet loaded is ignored when looking up domain names, but still possible
to find when the caller wants to know whether the zone is configured.
This causes a fallback to recursion when mirror zone data is unavailable
without making unloaded mirror zones invisible to code checking a zone's
existence.
Section 4 of RFC 7706 suggests that responses sourced from a local copy
of a zone should not have the AA bit set. Follow that recommendation by
setting 'qctx->authoritative' to ISC_FALSE when a response to a query is
coming from a mirror zone.
When a resolver is a regular slave (i.e. not a mirror) for some zone,
non-recursive queries for names below that slaved zone will return a
delegation sourced from it. This behavior is suboptimal for mirror
zones as their contents should rather be treated as validated, cached
DNS responses. Modify query_delegation() and query_zone_delegation() to
permit clients allowed cache access to check its contents for a better
answer when responding to non-recursive queries.
Interrupt query processing when query_recurse() attempts to ask the same
name servers for the same QNAME/QTYPE tuple for two times in a row as
this indicates that query processing may be stuck for an indeterminate
period of time, e.g. due to interactions between features able to
restart query_lookup().
Replace dns_fixedname_init() calls followed by dns_fixedname_name()
calls with calls to dns_fixedname_initname() where it is possible
without affecting current behavior and/or performance.
This patch was mostly prepared using Coccinelle and the following
semantic patch:
@@
expression fixedname, name;
@@
- dns_fixedname_init(&fixedname);
...
- name = dns_fixedname_name(&fixedname);
+ name = dns_fixedname_initname(&fixedname);
The resulting set of changes was then manually reviewed to exclude false
positives and apply minor tweaks.
It is likely that more occurrences of this pattern can be refactored in
an identical way. This commit only takes care of the low-hanging fruit.
CNAME between to served zones when recursion was
desired and available (RD=1, RA=1). Don't return
the CNAME target otherwise to prevent accidental
cache poisoning. [RT #47078]
4724. [func] By default, BIND now uses the random number
functions provided by the crypto library (i.e.,
OpenSSL or a PKCS#11 provider) as a source of
randomness rather than /dev/random. This is
suitable for virtual machine environments
which have limited entropy pools and lack
hardware random number generators.
This can be overridden by specifying another
entropy source via the "random-device" option
in named.conf, or via the -r command line option;
however, for functions requiring full cryptographic
strength, such as DNSSEC key generation, this
cannot be overridden. In particular, the -r
command line option no longer has any effect on
dnssec-keygen.
This can be disabled by building with
"configure --disable-crypto-rand".
[RT #31459] [RT #46047]
Stage 2 - synthesis of records from wildcard data.
If the dns64 or filter-aaaa* is configured then the
involved lookups are currently excluded. [RT #40138]
4713. [func] Added support for the DNS Response Policy Service
(DNSRPS) API, which allows named to use an external
response policy daemon when built with
"configure --enable-dnsrps". Thanks to Vernon
Schryver and Farsight Security. [RT #43376]
4708. [cleanup] Legacy Windows builds (i.e. for XP and earlier)
are no longer supported. [RT #45186]
4707. [func] The lightweight resolver daemon and library (lwresd
and liblwres) have been removed. [RT #45186]
4706. [func] Code implementing name server query processing has
been moved from bin/named to a new library "libns".
Functions remaining in bin/named are now prefixed
with "named_" rather than "ns_". This will make it
easier to write unit tests for name server code, or
link name server functionality into new tools.
[RT #45186]