because dns_qpmulti_commit() can be time consuming, it's inefficient
to open and commit a qpmulti transaction for each rdataset being loaded
into a database. we can improve load time by opening a qpmulti
transaction before adding a group of rdatasets and then committing it
afterward.
this commit adds 'setup' and 'commit' functions to dns_rdatacallbacks_t,
which can be called before and after the loops in which 'add' is
called in dns_master_load() and axfr_apply().
add database API method implementations needed to iterate and dump
a qpzone database to a file (createiterator, allrdatasets and
attachversion, plus dbiterator and rdatasetiter methods).
named-checkzone -D can now dump the contents of most zones,
but zone cuts are not correctly detected.
the dyndb test requires a mechanism to retrieve the name associated
with a database node, and since the database no longer uses RBT for
its underlying storage, dns_rbt_fullnamefromnode() doesn't work.
addressed this by adding dns_db_nodefullname() to the database API.
- Copy rbtdb.c, rbt-zonedb.c and rbt-cachedb.c to qp-*.
- Added qpmethods.
- Added a new structure dns_qpdata that will replace dns_rbtnode.
- Replaced normal, nsec, and nsec3 dns_rbt trees with dns_qp tries.
- Replaced dns_rbt_create() calls with dns_qp_create().
- Replaced the dns_rbt_destroy() call with dns_qp_destroy().
- Create a dns_qpdata struct and create/destroy methods.
This commit will not build.
The log message for commit 24381cc36d
explained:
In some older BIND 9 branches, the extra queuing overhead eliminated by
this change could be remotely exploited to cause excessive memory use.
Due to architectural shift, this branch is not vulnerable to that issue,
but applying the fix to the latter is nevertheless deemed prudent for
consistency and to make the code future-proof.
However, it turned out that having a single queue for the nodes to be
pruned increased lock contention to a level where cleaning up nodes from
the RBTDB took too long, causing the amount of memory used by the cache
to grow indefinitely over time.
This commit reverts the change to the pruning mechanism introduced by
commit 24381cc36d as BIND branches newer
than 9.16 were not affected by the excessive event queueing overhead
issue mentioned in the log message for the above commit.
- the DNS_DB_NSEC3ONLY and DNS_DB_NONSEC3 flags are mutually
exclusive; it never made sense to set both at the same time.
to enforce this, it is now a fatal error to do so. the
dbiterator implementation has been cleaned up to remove
code that treated the two as independent: if nonsec3 is
true, we can be certain nsec3only is false, and vice versa.
- previously, iterating a database backwards omitted
NSEC3 records even if DNS_DB_NONSEC3 had not been set. this
has been corrected.
- when an iterator reaches the origin node of the NSEC3 tree, we
need to skip over it and go to the next node in the sequence.
the NSEC3 origin node is there for housekeeping purposes and
never contains data.
- the dbiterator_test unit test has been expanded, several
incorrect expectations have been fixed. (for example, the
expected number of iterations has been reduced by one; we were
previously counting the NSEC3 origin node and we should not
have been doing so.)
- create_node() in rbt.c cannot fail
- the dns_rbt_*name() functions, which are wrappers around
dns_rbt_[add|find|delete]node(), were never used except in tests.
this change isn't really necessary since RBT is likely to go away
eventually anyway. but keeping the API as simple as possible while it
persists is a good thing, and may reduce confusion while QPDB is being
developed from RBTDB code.
these values pertain to whether a node is in the main, nsec, or nsec3
tree of an RBTDB. they need to be moved to a more generic location so
they can also be used by QPDB.
(this is in db.h rather than db_p.h because rbt.c needs access to it.
technically, that's a layer violation, but it's a long-existing one;
refactoring to get rid of it would be a large hassle, and eventually
we expect to remove rbt.c anyway.)
when the QPDB is implemented, we will need to have both qpdb_p.h and
rbtdb_p.h. in order to prevent name collisions or code duplication,
this commit adds a generic private header file, db_p.h, containing
structures and macros that will be used by both databases.
some functions and structs have been renamed to more specifically refer
to the RBT database, in order to avoid namespace collision with similar
things that will be needed by the QPDB later.
Expose the newly added 'first refresh' flag in the information
provided by the 'rndc staus' command, by showing the number of
zones, which are not yet fully ready, and their first refresh
is pending or is in-progress.
Add a new zone flag to indicate that a secondary type zone is
not yet fully ready, and a first time refresh is pending or is
in progress.
Expose this new flag in the statistics channel's "Incoming Zone
Transfers" section.
Instead of running all the cryptographic validation in a tight loop,
spread it out into multiple event loop "ticks", but moving every single
validation into own isc_async_run() asynchronous event. Move the
cryptographic operations - both verification and DNSKEY selection - to
the offloaded threads (isc_work_enqueue), this further limits the time
we spend doing expensive operations on the event loops that should be
fast.
Limit the impact of invalid or malicious RRSets that contain crafted
records causing the dns_validator to do many validations per single
fetch by adding a cap on the maximum number of validations and maximum
number of validation failures that can happen before the resolving
fails.
This is now the default way to implement attaching to/detaching from
a pointer.
Also update cfg_keystore_fromconfig() to allow NULL value for the
keystore pointer. In most cases we detach it immediately after the
function call.
Add a default key-directory parameter to the function that can
be returned if there is no keystore, or if the keystore directory
is NULL (the latter is also true for the built-in keystore).
When using the same PKCS#11 URI for a zone that uses different
DNSSEC policies, the PKCS#11 label could collide, i.e. the same
label could be used for different keys. Add the policy name to
the label to make it more unique.
Also, the zone name could contain characters that are interpreted
as special characters when parsing the PKCS#11 URI string. Mangle
the zone name through 'dns_name_tofilenametext()' to make it
PKCS#11 safe.
Move the creation to a separate function for clarity.
Furthermore, add a log message whenever a PKCS#11 object has been
successfully created.
The pkcs11-provider has changed to take a PKCS#11 URI instead of an
object identifier. Change the BIND 9 code accordingly to pass through
the label instead of just the object identifier.
See: https://github.com/latchset/pkcs11-provider/pull/284
Move dns_dnssec_findzonekeys from the dnssec.{c,h} source code to
zone.{c,h} (the header file already commented that this should be done
inside dns_zone_t).
Alter the function in such a way, that keys are searched for in the
key stores if a 'dnssec-policy' (kasp) is attached to the zone,
otherwise keep using the zone's key-directory.
When writing key files to disk, use the internally stored directory.
Add an access function 'dst_key_directory()'.
Most calls to keymgr functions no longer need to provide the
key-directory value. Only 'dns_keymgr_run' still needs access to
the zone's key-directory in case the key-store is set to the built-in
key-directory.
Refactor dns_dnssec_findmatchingkeys and dns_dnssec_keylistfromrdataset
to take into account the key store directories in case the zone is using
dnssec-policy (kasp). Add 'kasp' and 'keystores' parameters.
This requires the keystorelist to be stored inside the zone structure.
The calls to these functions in the DNSSEC tools can use NULL as the
kasp value, as dnssec-signzone does not (yet) support dnssec-policy,
and key collision is checked inside the directory where it is created.
If there is a keystore configured with a PKCS#11 URI, zones that
are using a dnssec-policy that uses such a keystore should create keys
via the PKCS#11 interface. Those keys are generally stored inside an
HSM.
Some changes to the code are required, to store the engine reference
into the keystore.
When creating the kasp structure, instead of storing the name of the
key store on keys, store a reference to the key store object instead.
This requires to build the keystore list prior to creating the kasp
structures, in the dnssec tools, the check code and the server code.
We will create a builtin keystore called "key-directory" which means
use the zone's key-directory as the key store.
The check code changes, because now the keystore is looked up before
creating the kasp structure (and if the keystore is not found, this
is an error). Instead of looking up the keystore after all
'dnssec-policy' clauses have been read.
Add checkconf check to ensure that the used key-store in the keys
section exists. Error if that is not the case. We also don't allow
the special keyword 'key-directory' as that is internally used to
signal that the zone's key-directory should be used.
Add code for configuring keystore objects. Add this to the "kaspconf"
code, as it is related to 'dnssec-policy' and it is too small to create
a separate file for it.
Instead of issuing a separate isc_async_run() call for every RBTDB node
that triggers tree pruning, maintain a list of nodes from which tree
pruning can be started from and only issue an isc_async_run() call if
pruning has not yet been triggered by another RBTDB node.
In some older BIND 9 branches, the extra queuing overhead eliminated by
this change could be remotely exploited to cause excessive memory use.
Due to architectural shift, this branch is not vulnerable to that issue,
but applying the fix to the latter is nevertheless deemed prudent for
consistency and to make the code future-proof.
When parsing messages use a hashmap instead of a linear search to reduce
the amount of work done in findname when there's more than one name in
the section.
There are two hashmaps:
1) hashmap for owner names - that's constructed for each section when we
hit the second name in the section and destroyed right after parsing
that section;
2) per-name hashmap - for each name in the section, we construct a new
hashmap for that name if there are more than one rdataset for that
particular name.
Multiple zones should be able to read the same key and signing policy
at the same time. Since writing the kasp lock only happens during
reconfiguration, and the complete kasp list is being replaced, there
is actually no need for a lock. Reference counting ensures that a kasp
structure is not destroyed when still being attached to one or more
zones.
This significantly improves the load configuration time.
As we are in overmem state we want to free more memory than we are
adding so we need to add in an allowance for the rbtnodes that may
have been added and the names stored with them. There is the node
for the owner name and a possible ENT node if there was a node split.
the 'predecessor' argument to dns_qp_lookup() turns out not to
be sufficient for our needs: the predecessor node in a QP database
could have become "empty" (for the current version) because of an
update or because cache data expired, and in that case the caller
would have to iterate more than one step back to find the predecessor
node that it needs.
it may also be necessary for a caller to iterate forward, in
order to determine whether a node has any children.
for both of these reasons, we now replace the 'predecessor'
argument with an 'iter' argument. if set, this points to memory
with enough space for a dns_qpiter object.
when an exact match is found by the lookup, the iterator will be
pointing to the matching node. if not, it will be pointing to the
lexical predecessor of the nae that was searched for.
a dns_qpiter_current() method has been added for examining
the current value of the iterator without moving it in either
direction.
The main intention of PROXY protocol is to pass endpoints information
to a back-end server (in our case - BIND). That means that it is a
valid way to spoof endpoints information, as the addresses and ports
extracted from PROXYv2 headers, from the point of view of BIND, are
used instead of the real connection addresses.
Of course, an ability to easily spoof endpoints information can be
considered a security issue when used uncontrollably. To resolve that,
we introduce 'allow-proxy' and 'allow-proxy-on' ACL options. These are
the only ACL options in BIND that work with real PROXY connections
addresses, allowing a DNS server operator to specify from what clients
and on which interfaces he or she is willing to accept PROXY
headers. By default, for security reasons we do not allow to accept
them.
BIND 9 will now treat the response as insecure when processing NSEC3
records with iterations larger than 50.
Earlier, we limited the number of iterations to 150 (in #2445).
RFC 9276 says: Because there has been a large growth of open (public)
DNSSEC validating resolvers that are subject to compute resource
constraints when handling requests from anonymous clients, this
document recommends that validating resolvers reduce their iteration
count limits over time. Specifically, validating resolver operators and
validating resolver software implementers are encouraged to continue
evaluating NSEC3 iteration count deployment trends and lower their
acceptable iteration limits over time.
After evaluation, we decided that the next major BIND release should
lower the maximum allowed NSEC3 iterations to 50, which should be
fine for 99,87% of the domain names.
The maximum DNS message size is 65535 octets. Check that the buffer
being passed to dns_message_renderbegin does not exceed this as the
compression code assumes that all offsets are no bigger than this.
EXPERIMENT: when DNS_DB_GLUEOK is set, dns_view_find() will now return
glue if it is found it a local zone database, without checking to see
if a better answer has been cached previously.
The current dispatch code could reuse the TCP connection when
dns_dispatch_gettcp() would be used first. This is problematic as the
dns_resolver doesn't use TCP connection sharing, but dns_request could
get the TCP stream that was created outside of the dns_request.
Add new DNS_DISPATCHOPT_UNSHARED option to dns_dispatch_createtcp() that
would prevent the TCP stream to be reused. Use that option in the
dns_resolver call to dns_dispatch_createtcp() to prevent dns_request
from reusing the TCP connections created by dns_resolver.
Additionally, the dns_xfrin unit added TCP connection sharing for
incoming transfers. While interleaving *xfr streams on a TCP connection
should work this should be a deliberate change and be property of the
server that can be controlled. Additionally some level of parallel TCP
streams is desirable. Revert to the old behaviour by removing the
dns_dispatch_gettcp() calls from dns_xfrin and use the new option to
prevent from sharing the transfer streams with dns_request.
The dns__catz_update_cb() does not expect that 'catzs->zones'
can become NULL during shutdown.
Add similar checks in the dns__catz_update_cb() and dns_catz_zone_get()
functions to protect from such a case. Also add an INSIST in the
dns_catz_zone_add() function to explicitly state that such a case
is not expected there, because that function is called only during a
reconfiguration.