Commit graph

55 commits

Author SHA1 Message Date
Evan Hunt
8e31dc5353 Fix a stack use-after-free in qpzone
In previous_closest_nsec(), a new qpreader was opened to search the NSEC
tree. It was possible for that to be used to update a QP iterator object
owned by the caller, and then be destroyed when the function returned.

This has been addressed by having the caller open the NSEC qpreader
instead.
2026-05-05 16:17:25 -07:00
Alessio Podda
04c52148bb Do not update the case on unchanged rdatasets
Fix an assertion failure on unchanged rdataset during IXFR.
2026-02-24 13:04:19 +01:00
Alessio Podda
97f2816947 Fix formatting
Cleanup formatting after IXFR changes.

(cherry picked from commit ad0a382092)
2026-02-02 10:32:38 +01:00
Alessio Podda
0a5e27deef Implement qpzone specific update path
This commit implements a batch update function for qpzone. The main
reason for this is speed: using addrdataset would cause a qp transaction
per rrdataset added, leading to a substantial slowdown compared to
RBTDB. The new API results in a qp transaction per applied diff.

(cherry picked from commit da53708dcb)
2026-02-02 10:32:38 +01:00
Alessio Podda
1d4ad50e81 Abstract updates into a vtable
This commit adds a layer of indirection to the apply_diff logic used by
IXFR and resigning by having the database updates go through a vtable.

We do this in three steps:
 - We extend dns_rdatacallbacks_t vtable to allow subtraction and
   resigning.
 - We add a new set of api (begin|commit|abort)update to the dbmethods
   vtable, that model an incremental update that can be aborted.
 - We extract the core logic of diff_apply into a function that
   satisfies the new interface.
 - We make diff_apply use this new function, and log the results.

The intent of this commit is to allow databases to expose a batch
incremental update implementation, just like they expose a custom
batch creation implementation through (begin|end)load.

(cherry picked from commit e36dc0ca76)
2026-01-29 09:13:02 +01:00
Matthijs Mekking
63262fd0f4 Implement dns_dbiterator_seek3
This is a new seek function for dbiterator that is meant to find an
NSEC3 node in a zone database. The difference with dns_dbiterator_seek
is that if the node does not exist, this seek function will point the
iterator to the next NSEC3 name.

(cherry picked from commit 41159e9062)
2025-12-11 13:53:25 +01:00
Ondřej Surý
89478d95c3
In dns_qpiter_{prev,next}, defer dereference_iter_node call
dns_qpiter_{prev,next} requires the current iterator node to still be
valid which might not always the case after dereference_iter_node was
called.  Currently, this is ensured via closeversion() mechanism, but it
is not guaranteed to be true in the future.

Move the call to dereference_iter_node to after the dns_qpiter_prev()
and dns_qpiter_next() to prevent a possible use-after-free of the
current iterator node.

(cherry picked from commit 9914bd383e)
2025-12-08 10:25:05 +01:00
Evan Hunt
25c9fb54da standardize CHECK and RETERR macros
previously, there were over 40 separate definitions of CHECK macros, of
which most used "goto cleanup", and the rest "goto failure" or "goto
out". there were another 10 definitions of RETERR, of which most were
identical to CHECK, but some simply returned a result code instead of
jumping to a cleanup label.

this has now been standardized throughout the code base: RETERR is for
returning an error code in the case of an error, and CHECK is for jumping
to a cleanup tag, which is now always called "cleanup". both macros are
defined in isc/util.h.

(cherry picked from commit 52bba5cc34)
2025-12-03 19:17:20 -08:00
Michał Kępień
d0e0706797
Revert "qpzone find() function could set foundname incorrectly"
This reverts commit dd1050e938.
2025-05-06 09:14:18 +02:00
Evan Hunt
dd1050e938 qpzone find() function could set foundname incorrectly
when a requested name is found in the QP trie during a lookup, but its
records have been marked as nonexistent by a previous deletion, then
it's treated as a partial match, but the foundname could be left
pointing to the original qname rather than the parent. this could
lead to an assertion failure in query_findclosestnsec3().
2025-03-17 09:27:09 +00:00
Evan Hunt
bfa5dd8991 qpzone.c:step() could ignore rollbacks
the step() function (used for stepping to the prececessor or
successor of a database node) could overlook a node because
there was an rdataset marked IGNORE because it had been rolled
back, covering an active rdataset under it.

(cherry picked from commit 24eaff7adc)
2025-03-14 23:22:59 +00:00
Ondřej Surý
614f8c1ef1 Acquire the database reference before possibly last node release
Acquire the database refernce in the detachnode() to prevent the last
reference to be release while the NODE_LOCK being locked.  The NODE_LOCK
is locked/unlocked inside the RCU critical section, thus it is most
probably this should not pose a problem as the database uses call_rcu
memory reclamation, but this it is still safer to acquire the reference
before releasing the node.

(cherry picked from commit d1ef6a93c1)
2025-03-06 10:39:17 +00:00
Ondřej Surý
ee6e64df21 Revert "fix: dev: Delete dead nodes when committing a new version"
This reverts commit 67255da4b3, reversing
changes made to 74c9ff384e.

(cherry picked from commit 1e4695510a)
2025-03-05 17:28:44 +00:00
Evan Hunt
e35e701c2c when committing a new qpzone version, delete dead nodes
if all data has been deleted from a node in the qpzone
database, delete the node too.

(cherry picked from commit e58ce19cf2)
2025-02-18 22:55:20 +00:00
Evan Hunt
a21168a221 fix dns_qp_insert() checks in qpzone
in some places there were checks for failures of dns_qp_insert()
after dns_qp_getname(). such failures could only happen if another
thread inserted a node between the two calls, and that can't happen
because the calls are serialized with dns_qpmulti_write(). we can
simplify the code and just add an INSIST.

(cherry picked from commit fffa150df3)
2025-02-18 05:55:02 +00:00
Mark Andrews
ae3e67717c Fix "CNAME and other data" detection
prio_type was being used in the wrong place to optimize cname_and_other.
We have to first exclude and accepted types and we also have to
determine that the record exists before we can check if we are at
a point where a later CNAME cannot appear.

(cherry picked from commit 5e49a9e4ae)
2025-02-14 13:41:11 +11:00
Ondřej Surý
db2bce1c6f
Switch the locknum generation for qpznode to random
Instead of using on hash of the name modulo number of the buckets,
assign the locknum randomly with isc_random_uniform().  This makes
the locknum assignment aligned with qpcache and allows the bucket
number to be non-prime in the future.

(cherry picked from commit 732fc338a9)
2025-02-04 23:28:53 +01:00
Ondřej Surý
d4e8a92977
Rely on call_rcu() to destroy the qpzone outside of locks
Reduce the number of qpzone_ref() and qpzone_unref() calls in
qpzone_detachnode() by relying on the call_rcu to delay
the destruction of the lock buckets.

(cherry picked from commit 1fa5219fdf)
2025-02-04 23:28:53 +01:00
Ondřej Surý
c6c03a6b11
Reduce false sharing in dns_qpzone
Instead of having many node_lock_count * sizeof(<member>) arrays, pack
all the members into a qpzone_bucket_t that is cacheline aligned and have
a single array of those.

(cherry picked from commit 6dcc398726)
2025-02-04 23:28:50 +01:00
Evan Hunt
5300eebc9e
Clarify reference counting in QP databases
Change the names of the node reference counting functions
and add comments to make the mechanism easier to understand:

- newref() and decref() are now called qpcnode_acquire()/
  qpznode_acquire() and qpcnode_release()/qpznode_release()
  respectively; this reflects the fact that they modify both
  the internal and external reference counters for a node.

- qpcnode_newref() and qpznode_newref() are now called
  qpcnode_erefs_increment() and qpznode_erefs_increment(), and
  qpcnode_decref() and qpznode_decref() are now called
  qpcnode_erefs_decrement() and qpznode_erefs_decrement(),
  to reflect that they only increase and decrease the node's
  external reference counters, not internal.

(cherry picked from commit d4f791793e)
2025-01-31 05:52:13 +01:00
Ondřej Surý
7dab6cdfbc
Remove db_nodelock_t in favor of reference counted qpdb
This removes the db_nodelock_t structure and changes the node_locks
array to be composed only of isc_rwlock_t pointers.  The .reference
member has been moved to qpdb->references in addition to
common.references that's external to dns_db API users.  The .exiting
members has been completely removed as it has no use when the reference
counting is used correctly.

(cherry picked from commit 431513d8b3)
2025-01-31 05:49:36 +01:00
Ondřej Surý
d1d444d2ab
Refactor decref() in both qpcache.c and qpzone.c
Cleanup the pattern in the decref() functions in both qpcache.c and
qpzone.c, so it follows the similar patter as we already have in
newref() function.

(cherry picked from commit 814b87da64)
2025-01-31 05:49:12 +01:00
JINMEI Tatuya
da0453b1d5
Optimize database decref by avoiding locking with refs > 1
Previously, this function always acquires a node write lock if it
might need node cleanup in case the reference decrements to 0.  In
fact, the lock is unnecessary if the reference is larger than 1 and it
can be optimized as an "easy" case. This optimization could even be
"necessary". In some extreme cases, many worker threads could repeat
acquring and releasing the reference on the same node, resulting in
severe lock contention for nothing (as the ref wouldn't decrement to 0
in most cases). This change would prevent noticeable performance
drop like query timeout for such cases.

Co-authored-by: JINMEI Tatuya <jtatuya@infoblox.com>
Co-authored-by: Ondřej Surý <ondrej@isc.org>

(cherry picked from commit 7f4471594d)
2025-01-22 14:29:30 +01:00
Ondřej Surý
547f376f21
Rewrite the GLUE cache in QP zone database
This is a second attempt to rewrite the GLUE cache to not use per
database version hash table.  Instead of keeping a hash table indexed by
the node, use a directly linked list of GLUE records for each
slabheader.  This was attempted before, but there was a data race caused
by the fact that the thread cleaning the GLUE records could be slower
than accessing the slab headers again and reinitializing the wait-free
stack.

The improved design builds on the previous design, but adds a new
dns_gluelist structure that has a pointer to the database version.

If a dns_gluelist belonging to a different (old) version is detected, it
is just detached from the slabheader and left for the closeversion() to
clean it up later.

(cherry picked from commit 29bde687b5)
2025-01-06 14:00:47 +01:00
Ondřej Surý
ad952ffee6
Revert "Fix the glue table in the QP and RBT zone databases"
This reverts commit 46cfebac58.

(cherry picked from commit 759d59801b)
2025-01-06 14:00:43 +01:00
Alessio Podda
1edf405add Optimize memory layout of core structs
Reduce memory footprint by:

 - Reordering struct fields to minimize padding.
 - Using exact-sized atomic types instead of *_least/*_fast variants
 - Downsizing integer fields where possible

Affected structs:

 - dns_name_t
 - dns_slabheader_t
 - dns_rdata_t
 - qpcnode_t
 - qpznode_t

(cherry picked from commit 32c7060bd2)
2024-12-09 09:04:28 +01:00
JINMEI Tatuya
08122316a7 emit more helpful log for exceeding max-records-per-type
The new log message is emitted when adding or updating an RRset
fails due to exceeding the max-records-per-type limit. The log includes
the owner name and type, corresponding zone name, and the limit value.
It will be emitted on loading a zone file, inbound zone transfer
(both AXFR and IXFR), handling a DDNS update, or updating a cache DB.
It's especially helpful in the case of zone transfer, since the
secondary side doesn't have direct access to the offending zone data.

It could also be used for max-types-per-name, but this change
doesn't implement it yet as it's much less likely to happen
in practice.

(cherry picked from commit 4156995431)
2024-11-27 11:17:34 +11:00
Ondřej Surý
58a15d38c2
Remove redundant parentheses from the return statement
(cherry picked from commit 0258850f20)
2024-11-19 14:26:52 +01:00
Matthijs Mekking
d768dd1f5d Revert "fix: chg: Improve performance when looking for the closest encloser when returning NSEC3 proofs"
This reverts merge request !9436

(cherry picked from commit 0396bf98ee)
2024-10-10 09:29:52 +00:00
Mark Andrews
b30bff7dee Return partial match when requested
Return partial match from dns_db_find/dns_db_find when requested
to short circuit the closest encloser discover process.  Most of the
time this will be the actual closest encloser but may not be when
there yet to be committed / cleaned up versions of the zone with
names below the actual closest encloser.

(cherry picked from commit d42ea08f16)
2024-08-29 21:40:16 +00:00
Ondřej Surý
46cfebac58 Fix the glue table in the QP and RBT zone databases
When adding glue to the header, we add header to the wait-free stack to
be cleaned up later which sets wfc_node->next to non-NULL value.  When
the actual cleaning happens we would only cleanup the .glue_list, but
since the database isn't locked for the time being, the headers could be
reused while cleaning the existing glue entries, which creates a data
race between database versions.

Revert the code back to use per-database-version hashtable where keys
are the node pointers.  This allows each database version to have
independent glue cache table that doesn't affect nodes or headers that
could already "belong" to the future database version.

(cherry picked from commit 5beae5faf9)
2024-08-05 14:43:18 +00:00
Ondřej Surý
b27c6bcce8
Expand the list of the priority types and move it to db_p.h
Add HTTPS, SVCB, SRV, PTR, NAPTR, DNSKEY and TXT records to the list of
the priority types that are put at the beginning of the slabheader list
for faster access and to avoid eviction when there are more types than
the max-types-per-name limit.
2024-07-01 12:47:30 +02:00
Ondřej Surý
52b3d86ef0
Add a limit to the number of RR types for single name
Previously, the number of RR types for a single owner name was limited
only by the maximum number of the types (64k).  As the data structure
that holds the RR types for the database node is just a linked list, and
there are places where we just walk through the whole list (again and
again), adding a large number of RR types for a single owner named with
would slow down processing of such name (database node).

Add a configurable limit to cap the number of the RR types for a single
owner.  This is enforced at the database (rbtdb, qpzone, qpcache) level
and configured with new max-types-per-name configuration option that
can be configured globally, per-view and per-zone.
2024-06-10 16:55:09 +02:00
Ondřej Surý
32af7299eb
Add a limit to the number of RRs in RRSets
Previously, the number of RRs in the RRSets were internally unlimited.
As the data structure that holds the RRs is just a linked list, and
there are places where we just walk through all of the RRs, adding an
RRSet with huge number of RRs inside would slow down processing of said
RRSets.

Add a configurable limit to cap the number of the RRs in a single RRSet.
This is enforced at the database (rbtdb, qpzone, qpcache) level and
configured with new max-records-per-type configuration option that can
be configured globally, per-view and per-zone.
2024-06-10 16:55:07 +02:00
Evan Hunt
9c882f1e69 replace qpzone node attriutes with atomics
there were TSAN error reports because of conflicting uses of
node->dirty and node->nsec, which were in the same qword.

this could be resolved by separating them, but we could also
make them into atomic values and remove some node locking.
2024-05-17 00:33:35 +00:00
Evan Hunt
4b02246130 fix more ambiguous struct names
there were some structure names used in qpcache.c and qpzone.c that
were too similar to each other and could be confusing when debugging.
they have been changed as follows:

in qcache.c:
- changed_t was unused, and has been removed
- search_t -> qpc_search_t
- qpdb_rdatasetiter_t -> qpc_rditer_t
- qpdb_dbiterator_t -> qpc_dbiter_t

in qpzone.c:
- qpdb_changed_t -> qpz_changed_t
- qpdb_changedlist_t -> qpz_changedlist_t
- qpdb_version_t -> qpz_version_t
- qpdb_versionlist_t -> qpz_versionlist_t
- qpdb_search_t -> qpz_search_t
- qpdb_load_t -> qpz_search_t
2024-04-30 12:50:01 -07:00
Evan Hunt
2789e58473 get foundname from the node
when calling dns_qp_lookup() from qpcache, instead of passing
'foundname' so that a name would be constructed from the QP key,
we now just use the name field in the node data. this makes
dns_qp_lookup() run faster.

the same optimization has also been added to qpzone.

the documentation for dns_qp_lookup() has been updated to
discuss this performance consideration.
2024-04-30 12:50:01 -07:00
Evan Hunt
85ab92b6e0 more cleanups in qpcache.c
- remove unneeded struct members and misleading comments.
- remove unused parameters for static functions.
- rename 'find_callback' to 'delegating' for consistency with qpzone;
  the find callback mechanism is not used in QP databases.
2024-04-30 12:42:31 -07:00
Evan Hunt
3acab71d46 rename QPDB_HEADERNODE to HEADERNODE
this makes the macro consistent between qpcache.c and qpzone.c.

also removed a redundant definition of HEADERNODE in qpzone.c.
2024-04-30 12:42:31 -07:00
Evan Hunt
46d40b3dca fix structure names in qpcache.c and qpzone.c
- change dns_qpdata_t to qpcnode_t (QP cache node), and dns_qpdb_t to
  qpcache_t, as these types are only accessed locally.
- also change qpdata_t in qpzone.c to qpznode_t (QP zone node), for
  consistency.
- make the refcount declarations for qpcnode_t and qpznode_t static,
  using the new ISC_REFCOUNT_STATIC macros.
2024-04-30 12:42:07 -07:00
Ondřej Surý
6c54337f52 avoid a race in the qpzone getsigningtime() implementation
the previous commit introduced a possible race in getsigningtime()
where the rdataset header could change between being found on the
heap and being bound.

getsigningtime() now looks at the first element of the heap, gathers the
locknum, locks the respective lock, and retrieves the header from the
heap again.  If the locknum has changed, it will rinse and repeat.
Theoretically, this could spin forever, but practically, it almost never
will as the heap changes on the zone are very rare.

we simplify matters further by changing the dns_db_getsigningtime()
API call. instead of passing back a bound rdataset, we pass back the
information the caller actually needed: the resigning time, owner name
and type of the rdataset that was first on the heap.
2024-04-25 15:48:43 -07:00
Evan Hunt
7e6be9f1b5 simplify qpzone database by using only one heap for resigning
in RBTDB, the heap was used by zone databases for resigning, and
by the cache for TTL-based cache cleaning. the cache use case required
very frequent updates, so there was a separate heap for each of the
node lock buckets.

qpzone is for zones only, so it doesn't need to support the cache
use case; the heap will only be touched when the zone is updated or
incrementally signed. we can simplify the code by using only a single
heap.
2024-04-25 15:41:39 -07:00
Evan Hunt
ea6659a5e9
update foundname when detecting a zonecut above qname
an assertion could be triggered in the QPDB cache if a DNAME
was found above a queried NS, because the 'foundname' value was
not correctly updated to point to the zone cut.

the same mistake existed in qpzone and has been fixed there as well.
2024-04-02 10:00:03 +02:00
Mark Andrews
4d2d80f534 Remove remenants of cache support from qpzone.c
These where leading to Coverity errors being reported.
2024-03-19 22:04:10 +00:00
Evan Hunt
f908d358c4 reduce memory consumption of qpzone database
every node of a QP database contains a copy of the nodename,
which is used as the key for the QP-trie. previously, the name
was stored as a dns_fixedname object, which has room for up to
255 characters. we can reduce the space consumed by dynamically
allocating a dns_name object that's just long enough for the name
to be stored.
2024-03-14 10:20:52 -07:00
Matthijs Mekking
ad33a73f83 Fix Coverity CID 487882: Error handling issues
The dns_qpiter_next() was called without checking the return value. If
we cannot move the iterator forward, there is no use in calling the
step() function.

/lib/dns/qpzone.c: 2804 in activeempty()
2798     	 * of the name we were searching for. Step the iterator
2799     	 * forward, then step() will continue forward until it
2800     	 * finds a node with active data. If that node is a
2801     	 * subdomain of the one we were looking for, then we're
2802     	 * at an active empty nonterminal node.
2803     	 */
>>>     CID 487882:  Error handling issues  (CHECKED_RETURN)
>>>     Calling "dns_qpiter_next" without checking return value (as is done elsewhere 26 out of 27 times).
2804     	dns_qpiter_next(it, NULL, NULL, NULL);
2805     	return (step(search, it, FORWARD, next) &&
2806     		dns_name_issubdomain(next, current));
2807     }
2024-03-14 14:01:23 +01:00
Evan Hunt
f0b164430a remove dead code in qpzone.c
qpzone does not support cache semantics, so dns_db_addrdataset(),
_deleterdataset() and _subtractrdataset() can't be run with
version == NULL; there's no need to check for it.

we can also clean up free_qpdb() a bit since current_version
is always non-NULL.
2024-03-13 17:15:18 -07:00
Evan Hunt
ac2c454f4f add a nodefullname implementation for the qpzone database
this enables the 'dyndb' system test to use a qpzone database.
2024-03-08 15:36:56 -08:00
Evan Hunt
3512cf5654 add setup/commit functions to rdatacallbacks
because dns_qpmulti_commit() can be time consuming, it's inefficient
to open and commit a qpmulti transaction for each rdataset being loaded
into a database.  we can improve load time by opening a qpmulti
transaction before adding a group of rdatasets and then committing it
afterward.

this commit adds 'setup' and 'commit' functions to dns_rdatacallbacks_t,
which can be called before and after the loops in which 'add' is
called in dns_master_load() and axfr_apply().
2024-03-08 15:36:56 -08:00
Evan Hunt
55f38e34dc improve node reference counting
QP database node data is not reference counted the same way RBT nodes
were: in the RBT, node->references could be zero if the node was in the
tree but was not in use by any caller, whereas in the QP trie, the
database itself uses reference counting of nodes internally.

this caused some subtle errors. in RBTDB, when the newref() function is
called and the node reference count was zero, the node lock reference
counter would also be incremented. in the QP trie, this can never
happen - because as long as the node is in the database its reference
count cannot be zero - and so the node lock reference counter was never
incremented.

this has been addressed by maintaining a separate "erefs" counter for
external references to the node. this is the same approach used in the
"qpdb-lite" database in commit e91fbd8dea.

while troubleshooting this issue, some compile errors were discovered
when building with DNS_DB_NODETRACE; those have also been fixed.
2024-03-08 15:36:56 -08:00