Commit graph

716 commits

Author SHA1 Message Date
Ondřej Surý
5f15df5c53
Fix memory leak in QPcache addnoqname/addclosest mechanism
The attacker that controls DNSSEC-signed zone can trigger a memory leak
in the addnoqname() and/or addclosest() by creating more than
max-records-per-type RRSIG for any NSEC records.  The memory leaks have
been fixed.

(cherry picked from commit a854a5c83d)
2026-03-13 13:22:23 +01:00
Matthijs Mekking
63262fd0f4 Implement dns_dbiterator_seek3
This is a new seek function for dbiterator that is meant to find an
NSEC3 node in a zone database. The difference with dns_dbiterator_seek
is that if the node does not exist, this seek function will point the
iterator to the next NSEC3 name.

(cherry picked from commit 41159e9062)
2025-12-11 13:53:25 +01:00
Mark Andrews
b677d31fca
In dbiterator_prev, dereference_iter_node was being called too soon
dns_rbtnodechain_prev requires the current node to still be valid
which was not always the case after dereference_iter_node was called.
Move the call to dereference_iter_node to after the dns_rbtnodechain_prev
to preserve the node.
2025-12-08 10:25:17 +01:00
Evan Hunt
25c9fb54da standardize CHECK and RETERR macros
previously, there were over 40 separate definitions of CHECK macros, of
which most used "goto cleanup", and the rest "goto failure" or "goto
out". there were another 10 definitions of RETERR, of which most were
identical to CHECK, but some simply returned a result code instead of
jumping to a cleanup label.

this has now been standardized throughout the code base: RETERR is for
returning an error code in the case of an error, and CHECK is for jumping
to a cleanup tag, which is now always called "cleanup". both macros are
defined in isc/util.h.

(cherry picked from commit 52bba5cc34)
2025-12-03 19:17:20 -08:00
Mark Andrews
f8cafb9756 Fix missing RRSIGs for "glue" lookups with CD=1
The code to test whether to store the RRSIGs on DNS_R_UNCHANGED
with CD=1 was failing because the comparison methods of the two
rdatatset instances were not compatible.  Move the testing into
dns_db_addrdataset(), and request it by setting the DNS_ADD_EQUALOK
option.  If the option is set and the old and new rrsets compare
as equal, dns_db_addrdataset() returns ISC_R_SUCCESS instead of
DNS_R_UNCHANGED.

(cherry picked from commit b954a1df43)
2025-09-10 17:08:52 +10:00
Ondřej Surý
08328a9cce
Don't preserve cache entries if new TTL is smaller than existing
Under certain circumstances, cache entries with equivalent rdataset
might not get replaced.  Previously such entry would get preserved
regardless of the new TTL and expire time on the existing header would
get updated when the expire time was less than the expire time on the
existing header.  Change the logic to preserve the existing header only
if the new expire time is larger than the existing one and replace the
existing cache entry when the new expire time is less than the existing
one.

Co-authored-by: Jinmei Tatuya <jtatuya@infoblox.com>
(cherry picked from commit 9f7ba584cf)
2025-08-26 21:13:25 +02:00
Ondřej Surý
06e3d996c1
Preserve ZEROTTL attribute when replacing NS RRset
Previously, BIND 9 would drop the ZEROTTL attribute when updating
previously cached NS entry with ZEROTTL attribute set.

Co-authored-by: Jinmei Tatuya <jtatuya@infoblox.com>
(cherry picked from commit 982ca161c2)
2025-08-26 21:12:21 +02:00
Mark Andrews
ae3e67717c Fix "CNAME and other data" detection
prio_type was being used in the wrong place to optimize cname_and_other.
We have to first exclude and accepted types and we also have to
determine that the record exists before we can check if we are at
a point where a later CNAME cannot appear.

(cherry picked from commit 5e49a9e4ae)
2025-02-14 13:41:11 +11:00
Ondřej Surý
8229d9cdfa
Print the expiration time of the stale records (not ancient)
In #1870, the expiration time of ANCIENT records were printed, but
actually the ancient records are very short lived, and the information
carries a little value.

Instead of printing the expiration of ANCIENT records, print the
expiration time of STALE records.

(cherry picked from commit 355fc48472)
2025-02-04 18:07:59 +01:00
Ondřej Surý
302aca809d
Expand the usage of mark_ancient() helper functions
When the mark_ancient() helper function was introduced, couple of places
with duplicate (or almost duplicate) code was missed.  Move the
mark_ancient() function closer to the top of the file, and correctly use
it in places that mark the header as ANCIENT.

(cherry picked from commit 58179e6a19)
2025-02-03 15:53:34 +01:00
Ondřej Surý
4b114838de
Add better ZEROTTL handling in bindrdataset()
If we know that the header has ZEROTTL set, the server should never send
stale records for it and the TTL should never be anything else than 0.
The comment was already there, but the code was not matching the
comment.

(cherry picked from commit cfee6aa565)
2025-02-03 15:53:34 +01:00
Ondřej Surý
b32512a232
In cache, set rdataset TTL to 0 when the header is not active
When the header has been marked as ANCIENT, but the ttl hasn't been
reset (this happens in couple of places), the rdataset TTL would be
set to the header timestamp instead to a reasonable TTL value.

Since this header has been already expired (ANCIENT is set), set the
rdataset TTL to 0 and don't reuse this field to print the expiration
time when dumping the cache.  Instead of printing the time, we now
just print 'expired (awaiting cleanup'.

(cherry picked from commit 1bbb57f81b)
2025-02-03 15:53:34 +01:00
Ondřej Surý
857225aeb6
Clarify reference counting in RBTDB database
Change the names of the node reference counting functions
and add comments to make the mechanism easier to understand:

- dns__rbtdb_newref() and dns__rbtdb_decref() are now called
  dns__rbtnode_acquire() and dns__rbtnode_release()
  respectively; this reflects the fact that they modify both
  the internal and external reference counters for a node.

- rbtnode_newref() and rbtnode_decref are now called
  rbtnode_erefs_increment() and rbtnode_erefs_decrement(),
  to reflect that they only increase and decrease the node's
  external reference counters, not internal.
2025-01-31 06:07:48 +01:00
Ondřej Surý
9c45de9473
Refactor node reference counting in rbtdb.c
Refactor the pattern in the newref() and decref() functions in rbtdb.c
following the pattern, so it follows the similar pattern we already have
for QPDB.
2025-01-31 05:52:13 +01:00
JINMEI Tatuya
da0453b1d5
Optimize database decref by avoiding locking with refs > 1
Previously, this function always acquires a node write lock if it
might need node cleanup in case the reference decrements to 0.  In
fact, the lock is unnecessary if the reference is larger than 1 and it
can be optimized as an "easy" case. This optimization could even be
"necessary". In some extreme cases, many worker threads could repeat
acquring and releasing the reference on the same node, resulting in
severe lock contention for nothing (as the ref wouldn't decrement to 0
in most cases). This change would prevent noticeable performance
drop like query timeout for such cases.

Co-authored-by: JINMEI Tatuya <jtatuya@infoblox.com>
Co-authored-by: Ondřej Surý <ondrej@isc.org>

(cherry picked from commit 7f4471594d)
2025-01-22 14:29:30 +01:00
Ondřej Surý
547f376f21
Rewrite the GLUE cache in QP zone database
This is a second attempt to rewrite the GLUE cache to not use per
database version hash table.  Instead of keeping a hash table indexed by
the node, use a directly linked list of GLUE records for each
slabheader.  This was attempted before, but there was a data race caused
by the fact that the thread cleaning the GLUE records could be slower
than accessing the slab headers again and reinitializing the wait-free
stack.

The improved design builds on the previous design, but adds a new
dns_gluelist structure that has a pointer to the database version.

If a dns_gluelist belonging to a different (old) version is detected, it
is just detached from the slabheader and left for the closeversion() to
clean it up later.

(cherry picked from commit 29bde687b5)
2025-01-06 14:00:47 +01:00
Ondřej Surý
ad952ffee6
Revert "Fix the glue table in the QP and RBT zone databases"
This reverts commit 46cfebac58.

(cherry picked from commit 759d59801b)
2025-01-06 14:00:43 +01:00
Ondřej Surý
db5803a0ec
Use attach()/detach() functions instead of touching .references
In rbtdb.c, there were places where the code touched .references
directly instead of using the helper functions.  Use the helper
functions instead.
2024-11-27 21:16:22 +01:00
JINMEI Tatuya
08122316a7 emit more helpful log for exceeding max-records-per-type
The new log message is emitted when adding or updating an RRset
fails due to exceeding the max-records-per-type limit. The log includes
the owner name and type, corresponding zone name, and the limit value.
It will be emitted on loading a zone file, inbound zone transfer
(both AXFR and IXFR), handling a DDNS update, or updating a cache DB.
It's especially helpful in the case of zone transfer, since the
secondary side doesn't have direct access to the offending zone data.

It could also be used for max-types-per-name, but this change
doesn't implement it yet as it's much less likely to happen
in practice.

(cherry picked from commit 4156995431)
2024-11-27 11:17:34 +11:00
Ondřej Surý
58a15d38c2
Remove redundant parentheses from the return statement
(cherry picked from commit 0258850f20)
2024-11-19 14:26:52 +01:00
Ondřej Surý
46cfebac58 Fix the glue table in the QP and RBT zone databases
When adding glue to the header, we add header to the wait-free stack to
be cleaned up later which sets wfc_node->next to non-NULL value.  When
the actual cleaning happens we would only cleanup the .glue_list, but
since the database isn't locked for the time being, the headers could be
reused while cleaning the existing glue entries, which creates a data
race between database versions.

Revert the code back to use per-database-version hashtable where keys
are the node pointers.  This allows each database version to have
independent glue cache table that doesn't affect nodes or headers that
could already "belong" to the future database version.

(cherry picked from commit 5beae5faf9)
2024-08-05 14:43:18 +00:00
Ondřej Surý
57cd34441a
Be smarter about refusing to add many RR types to the database
Instead of outright refusing to add new RR types to the cache, be a bit
smarter:

1. If the new header type is in our priority list, we always add either
   positive or negative entry at the beginning of the list.

2. If the new header type is negative entry, and we are over the limit,
   we mark it as ancient immediately, so it gets evicted from the cache
   as soon as possible.

3. Otherwise add the new header after the priority headers (or at the
   head of the list).

4. If we are over the limit, evict the last entry on the normal header
   list.
2024-07-01 12:48:51 +02:00
Ondřej Surý
b27c6bcce8
Expand the list of the priority types and move it to db_p.h
Add HTTPS, SVCB, SRV, PTR, NAPTR, DNSKEY and TXT records to the list of
the priority types that are put at the beginning of the slabheader list
for faster access and to avoid eviction when there are more types than
the max-types-per-name limit.
2024-07-01 12:47:30 +02:00
Ondřej Surý
52b3d86ef0
Add a limit to the number of RR types for single name
Previously, the number of RR types for a single owner name was limited
only by the maximum number of the types (64k).  As the data structure
that holds the RR types for the database node is just a linked list, and
there are places where we just walk through the whole list (again and
again), adding a large number of RR types for a single owner named with
would slow down processing of such name (database node).

Add a configurable limit to cap the number of the RR types for a single
owner.  This is enforced at the database (rbtdb, qpzone, qpcache) level
and configured with new max-types-per-name configuration option that
can be configured globally, per-view and per-zone.
2024-06-10 16:55:09 +02:00
Ondřej Surý
32af7299eb
Add a limit to the number of RRs in RRSets
Previously, the number of RRs in the RRSets were internally unlimited.
As the data structure that holds the RRs is just a linked list, and
there are places where we just walk through all of the RRs, adding an
RRSet with huge number of RRs inside would slow down processing of said
RRSets.

Add a configurable limit to cap the number of the RRs in a single RRSet.
This is enforced at the database (rbtdb, qpzone, qpcache) level and
configured with new max-records-per-type configuration option that can
be configured globally, per-view and per-zone.
2024-06-10 16:55:07 +02:00
Evan Hunt
2c88946590 dns_name_dupwithoffsets() cannot fail
this function now always returns success; change it to void and
clean up its callers.
2024-04-10 22:51:07 -04:00
Evan Hunt
b3c8b5cfb2 remove dead code in rbtdb.c
dns_db_addrdataset() enforces a requirement that version can only
be NULL for a cache database. code that checks for zone semantics
and version == NULL can never be reached.
2024-03-13 17:15:18 -07:00
Ondřej Surý
454c75a33a
Restore the parent cleaning logic in prune_tree()
Reconstruct the variant of the prune_tree() parent cleaning to consider
all elibible parents in a single loop as we were doing before all the
changes that led to this commit.

Update code comments so that they more precisely describe what the
relevant bits of code actually do.
2024-03-06 13:03:17 +01:00
Evan Hunt
845f832308 rename dns_rbtdb to dns_qpdb
this commit renames all variables and macros with the string "rbtdb"
or "RBDTB" to "qpdb" or "QPDB".
2024-03-06 09:57:24 +01:00
Ondřej Surý
d8220ca4ca
Make the TTL-based cleaning more aggressive
It was discovered that the TTL-based cleaning could build up
a significant backlog of the rdataset headers during the periods where
the top of the TTL heap isn't expired yet.  Make the TTL-based cleaning
more aggressive by cleaning more headers from the heap when we are
adding new header into the RBTDB.
2024-02-29 12:57:06 +01:00
Ondřej Surý
a9383e4b95
Remove expired rdataset headers from the heap
It was discovered that an expired header could sit on top of the heap
a little longer than desireable.  Remove expired headers (headers with
rdh_ttl set to 0) from the heap completely, so they don't block the next
TTL-based cleaning.
2024-02-29 12:56:36 +01:00
Ondřej Surý
0b32d323e0
Simplify the parent cleaning in the prune_tree() mechanism
Instead of juggling with node locks in a cycle, cleanup the node we are
just pruning and send any the parent that's also subject to the pruning
to the prune tree via normal way (e.g. enqueue pruning on the parent).

This simplifies the code and also spreads the pruning load across more
event loop ticks which is better for lock contention as less things run
in a tight loop.
2024-02-29 11:23:03 +01:00
Ondřej Surý
eed17611d8
Reduce lock contention during RBTDB tree pruning
The log message for commit 24381cc36d
explained:

    In some older BIND 9 branches, the extra queuing overhead eliminated by
    this change could be remotely exploited to cause excessive memory use.
    Due to architectural shift, this branch is not vulnerable to that issue,
    but applying the fix to the latter is nevertheless deemed prudent for
    consistency and to make the code future-proof.

However, it turned out that having a single queue for the nodes to be
pruned increased lock contention to a level where cleaning up nodes from
the RBTDB took too long, causing the amount of memory used by the cache
to grow indefinitely over time.

This commit reverts the change to the pruning mechanism introduced by
commit 24381cc36d as BIND branches newer
than 9.16 were not affected by the excessive event queueing overhead
issue mentioned in the log message for the above commit.
2024-02-29 11:23:03 +01:00
Evan Hunt
e40fd4ed06 fix several bugs in the RBTDB dbiterator implementation
- the DNS_DB_NSEC3ONLY and DNS_DB_NONSEC3 flags are mutually
  exclusive; it never made sense to set both at the same time.
  to enforce this, it is now a fatal error to do so.  the
  dbiterator implementation has been cleaned up to remove
  code that treated the two as independent: if nonsec3 is
  true, we can be certain nsec3only is false, and vice versa.
- previously, iterating a database backwards omitted
  NSEC3 records even if DNS_DB_NONSEC3 had not been set. this
  has been corrected.
- when an iterator reaches the origin node of the NSEC3 tree, we
  need to skip over it and go to the next node in the sequence.
  the NSEC3 origin node is there for housekeeping purposes and
  never contains data.
- the dbiterator_test unit test has been expanded, several
  incorrect expectations have been fixed. (for example, the
  expected number of iterations has been reduced by one; we were
  previously counting the NSEC3 origin node and we should not
  have been doing so.)
2024-02-15 10:15:50 -08:00
Michał Kępień
8610799317 BIND 9.19.21
-----BEGIN SSH SIGNATURE-----
 U1NIU0lHAAAAAQAAARcAAAAHc3NoLXJzYQAAAAMBAAEAAAEBANamVSTMToLcHCXRu1f52e
 tTJWV3T1GSVrPYXwAGe6EVC7m9CTl06FZ9ZG/ymn1S1++dk4ByVZXf6dODe2Mu0RuqGmyf
 MUEMKXVdj3cEQhgRaMjBXvIZoYAsQlbHO2BEttomq8PhrpLRizDBq4Bv2aThM0XN2QqSGS
 ozwYMcPiGUoMVNcVrC4ZQ+Cptb5C4liqAcpRqrSo8l1vcNg5b1Hk6r7NFPdx542gsGMLae
 wZrnKn3LWz3ZXTGeK2cRmBxm/bydiVSCsc9XjB+tWtIGUpQsfaXqZ7Hs6t+1f1vsnu88oJ
 oi1dRBo3YNRl49UiCukXWayQrPJa8wwxURS9W28JMAAAADZ2l0AAAAAAAAAAZzaGE1MTIA
 AAEUAAAADHJzYS1zaGEyLTUxMgAAAQBSREyaosd+mY8kovqAvGYR8pOui/7gOi6pBprPGw
 RlOB5z6YOx5FOjbVL/YvBhKk2gbox++o8jCMEmdNNbWeO3U3uBvxCa+8QGARbuMV6vdoR4
 qjnOgOfryXyaRw7PQX0ZH0gPw1B1036y5bnW7WPkqrTvGgxW34O1q6j0EumE0vh90E24/l
 PAWKDCTqDR/+slGDuWgtPcCZuClljw1Mh0dAliKkGhp0l80qMQSr6O/p66A44UxzKwtnnt
 lagtO0j4nZ+BxC/hyaFc/FlCzeoc48qFQRIt0ZjYKU+XK0CUr2RTpYFdi/n7y3BNd7bDkD
 nIkEDddn/lXP5rkAdkmDCa
 -----END SSH SIGNATURE-----
gpgsig -----BEGIN SSH SIGNATURE-----
 U1NIU0lHAAAAAQAAADMAAAALc3NoLWVkMjU1MTkAAAAg25GGAuUyFX1gxo7QocNm8V6J/8
 frHSduYX7Aqk4iJLwAAAADZ2l0AAAAAAAAAAZzaGE1MTIAAABTAAAAC3NzaC1lZDI1NTE5
 AAAAQEGqBHXwCtEJxRzHbTp6CfBNjqwIAjRD9G+HC4M7q77KBEBgc6dRf15ZRRgiWJCk5P
 iHMZkEMyWCnELMzhiTzgE=
 -----END SSH SIGNATURE-----

Merge tag 'v9.19.21'

BIND 9.19.21
2024-02-14 13:24:56 +01:00
Evan Hunt
ac9bd03a0d clean up dns_rbt
- create_node() in rbt.c cannot fail
- the dns_rbt_*name() functions, which are wrappers around
  dns_rbt_[add|find|delete]node(), were never used except in tests.

this change isn't really necessary since RBT is likely to go away
eventually anyway. but keeping the API as simple as possible while it
persists is a good thing, and may reduce confusion while QPDB is being
developed from RBTDB code.
2024-02-14 01:36:44 -08:00
Evan Hunt
78d173b548 move DNS_RBT_NSEC_* to db.h
these values pertain to whether a node is in the main, nsec, or nsec3
tree of an RBTDB. they need to be moved to a more generic location so
they can also be used by QPDB.

(this is in db.h rather than db_p.h because rbt.c needs access to it.
technically, that's a layer violation, but it's a long-existing one;
refactoring to get rid of it would be a large hassle, and eventually
we expect to remove rbt.c anyway.)
2024-02-14 01:13:44 -08:00
Evan Hunt
27c862d953 separate generic DB helpers into db_p.h
when the QPDB is implemented, we will need to have both qpdb_p.h and
rbtdb_p.h. in order to prevent name collisions or code duplication,
this commit adds a generic private header file, db_p.h, containing
structures and macros that will be used by both databases.

some functions and structs have been renamed to more specifically refer
to the RBT database, in order to avoid namespace collision with similar
things that will be needed by the QPDB later.
2024-02-14 09:00:27 +01:00
Ondřej Surý
3f774c2a8a
Optimize cname_and_other_data to stop as earliest as possible
Stop the cname_and_other_data processing if we already know that the
result is true.  Also, we know that CNAME will be placed in the priority
headers, so we can stop looking for CNAME if we haven't found CNAME and
we are past the priority headers.
2024-02-08 08:33:36 +01:00
Ondřej Surý
3ac482be7f
Optimize the slabheader placement for certain RRTypes
Mark the infrastructure RRTypes as "priority" types and place them at
the beginning of the rdataslab header data graph.  The non-priority
types either go right after the priority types (if any).
2024-02-08 08:33:36 +01:00
Michał Kępień
24381cc36d
Limit isc_async_run() overhead for tree pruning
Instead of issuing a separate isc_async_run() call for every RBTDB node
that triggers tree pruning, maintain a list of nodes from which tree
pruning can be started from and only issue an isc_async_run() call if
pruning has not yet been triggered by another RBTDB node.

In some older BIND 9 branches, the extra queuing overhead eliminated by
this change could be remotely exploited to cause excessive memory use.
Due to architectural shift, this branch is not vulnerable to that issue,
but applying the fix to the latter is nevertheless deemed prudent for
consistency and to make the code future-proof.
2024-01-05 12:33:14 +01:00
Mark Andrews
5e8f0e9ceb Process the combined LRU lists in LRU order
Only cleanup headers that are less than equal to the rbt's last_used
time.  Adjust the rbt's last_used time when the target cleaning was
not achieved to the oldest value of the remaining set of headers.

When updating delegating NS and glue records last_used was not being
updated when it should have been.

When adding zero TTL records to the tail of the LRU lists set
last_used to rbtdb->last_used + 1 rather than now.  This appoximately
preserves the lists LRU order.
2023-12-07 02:59:04 +00:00
Ondřej Surý
89fcb6f897
Apply the isc_mem_cget semantic patch 2023-08-31 22:08:35 +02:00
Ondřej Surý
8c4cf5b1de
fixup! Use cds_lfht for updatenotify mechanism in dns_db unit 2023-07-31 18:11:34 +02:00
Ondřej Surý
a1afa31a5a
Use cds_lfht for updatenotify mechanism in dns_db unit
The updatenotify mechanism in dns_db relied on unlocked ISC_LIST for
adding and removing the "listeners".  The mechanism relied on the
exclusive mode - it should have been updated only during reconfiguration
of the server.  This turned not to be true anymore in the dns_catz - the
updatenotify list could have been updated during offloaded work as the
offloaded threads are not subject to the exclusive mode.

Change the update_listeners to be cds_lfht (lock-free hash-table), and
slightly refactor how register and unregister the callbacks - the calls
are now idempotent (the register call already was and the return value
of the unregister function was mostly ignored by the callers).
2023-07-31 18:11:34 +02:00
Ondřej Surý
b6b0d81a36
Cleanup the __tsan_acquire/__tsan_release
With ThreadSanitizer support added to the Userspace RCU, we no longer
need to wrap the call_rcu and caa_container_of with
__tsan_{acquire,release} hints.  Remove the direct calls to
__tsan_{acquire,release} and the isc_urcu_{container,cleanup} macros.
2023-07-28 08:59:08 +02:00
Ondřej Surý
5321c474ea Refactor isc_stats_create() and its downstream users to return void
The isc_stats_create() can no longer return anything else than
ISC_R_SUCCESS.  Refactor isc_stats_create() and its variants in libdns,
libns and named to just return void.
2023-07-27 11:37:44 +02:00
Evan Hunt
5a85135c1e
split out cache-specific functions
move cache-specific functions from rbtdb.c to rbt-cachedb.c.
2023-07-17 14:50:25 +02:00
Evan Hunt
9a1a1293c0
split out zone-specific functions
move zone-specific functions from rbtdb.c to rbt-zonedb.c.
2023-07-17 14:50:25 +02:00
Evan Hunt
445ef1d033
move slab rdataset implementation to rdataslab.c
ultimately we want the slab implementation of dns_rdataset to
be usable by more database implementaions than just rbtdb. this
commit moves rdataset_methods to rdataslab.c, renamed
dns_rdataslab_rdatasetmethods.

new database methods have been added: locknode, unlocknode,
addglue, expiredata, and deletedata, allowing external functions to
perform functions that previously required internal access to the
database implementation.

database and heap pointers are now stored in the dns_slabheader object
so that header is the only thing that needs to be passed to some
functions; this will simplify moving functions that process slabheaders
out of rbtdb.c so they can be used by other database implementations.
2023-07-17 14:50:25 +02:00