Commit aab691d512 did not fix all possible
scenarios in which the ns_statscounter_recursclients counter underflows.
The solution implemented therein can be ineffective e.g. when CNAME
chaining happens with prefetching enabled.
Here is an example recursive resolution scenario in which the
ns_statscounter_recursclients counter can underflow with the current
logic in effect:
1. Query processing starts, the answer is not found in the cache, so
recursion is started. The NS_CLIENTATTR_RECURSING attribute is set.
ns_statscounter_recursclients is incremented (Δ = +1).
2. Recursion completes, returning a CNAME. client->recursionquota is
non-NULL, so the NS_CLIENTATTR_RECURSING attribute remains set.
ns_statscounter_recursclients is decremented (Δ = 0).
3. Query processing restarts.
4. The current QNAME (the target of the CNAME from step 2) is found in
the cache, with a TTL low enough to trigger a prefetch.
5. query_prefetch() attaches to client->recursionquota.
ns_statscounter_recursclients is not incremented because
query_prefetch() does not do that (Δ = 0).
6. Query processing restarts.
7. The current QNAME (the target of the CNAME from step 4) is not found
in the cache, so recursion is started. client->recursionquota is
already attached to (since step 5) and the NS_CLIENTATTR_RECURSING
attribute is set (since step 1), so ns_statscounter_recursclients is
not incremented (Δ = 0).
8. The prefetch from step 5 completes. client->recursionquota is
detached from in prefetch_done(). ns_statscounter_recursclients is
not decremented because prefetch_done() does not do that (Δ = 0).
9. Recursion for the current QNAME completes. client->recursionquota
is already detached from, i.e. set to NULL (since step 8), and the
NS_CLIENTATTR_RECURSING attribute is set (since step 1), so
ns_statscounter_recursclients is decremented (Δ = -1).
Another possible scenario is that after step 7, recursion for the target
of the CNAME from step 4 completes before the prefetch for the CNAME
itself. fetch_callback() then notices that client->recursionquota is
non-NULL and decrements ns_statscounter_recursclients, even though
client->recursionquota was attached to by query_prefetch() and therefore
not accompanied by an incrementation of ns_statscounter_recursclients.
The net result is also an underflow.
Instead of trying to properly handle all possible orderings of events
set into motion by normal recursion and prefetch-triggered recursion,
adjust ns_statscounter_recursclients whenever the recursive clients
quota is successfully attached to or detached from. Remove the
NS_CLIENTATTR_RECURSING attribute altogether as its only purpose is made
obsolete by this change.
(cherry picked from commit f7482b68b9)
While refactoring the libns to use the new network manager, the
max-transfer-*-out options were not implemented and they were turned
non-operational.
Reimplement the max-transfer-idle-out functionality using the write
timer and max-transfer-time-out using the new isc_nm_timer API.
(cherry picked from commit 8643bbab84)
While refactoring the lib/ns/xfrout.c, it was discovered that .shutdown
and .shutdown_arg members of ns_client_t structure are unused.
Remove the unused members and associated code that was using in it in
the ns_xfrout.
(cherry picked from commit 037549c405)
Use the isc_nmhandle_setwritetimeout() function in the netmgr unit test
to allow more time for writing and reading the responses because some of
the intervals that are used in the unit tests are really small leaving a
little room for any delays.
(cherry picked from commit ee359d6ffa)
In some situations (unit test and forthcoming XFR timeouts MR), we need
to modify the write timeout independently of the read timeout. Add a
isc_nmhandle_setwritetimeout() function that could be called before
isc_nm_send() to specify a custom write timeout interval.
(cherry picked from commit a89d9e0fa6)
When the outgoing TCP write buffers are full because the other party is
not reading the data, the uv_write() could wait indefinitely on the
uv_loop and never calling the callback. Add a new write timer that uses
the `tcp-idle-timeout` value to interrupt the TCP connection when we are
not able to send data for defined period of time.
(cherry picked from commit 408b362169)
The uv_tcp_close_reset() function was added in libuv 1.32.0 and since we
support older libuv releases, we have to add a shim uv_tcp_close_reset()
implementation loosely based on libuv.
(cherry picked from commit cd3b58622c)
Before adding the write timer, we have to remove the generic sock->timer
to sock->read_timer. We don't touch the function names to limit the
impact of the refactoring.
(cherry picked from commit 45a73c113f)
Replace the RUNTIME_CHECK() calls for libuv API calls with
UV_RUNTIME_CHECK() to get more detailed error message when
something fails and should not.
(cherry picked from commit 8715be1e4b)
When libuv functions fail, they return correct return value that could
be useful for more detailed debugging. Currently, we usually just check
whether the return value is 0 and invoke assertion error if it doesn't
throwing away the details why the call has failed. Unfortunately, this
often happen on more exotic platforms.
Add a UV_RUNTIME_CHECK() macro that can be used to print more detailed
error message (via uv_strerror() before ending the execution of the
program abruptly with the assertion.
(cherry picked from commit 62e15bb06d)
The task exclusive mode stops all processing (tasks and networking IO)
except the designated exclusive task events. This has impact on the
operation of the server. Add log messages indicating when we start the
exclusive mode, and when we end exclusive task mode.
(cherry picked from commit b9cb29076f)
The isc_thread_setaffinity call was removed in !5265 and we are not
going to restore it because it was proven that the performance is better
without it. Additionally, remove the already disabled cpu system test.
The isc_thread_setconcurrency function is unused and also calling
pthread_setconcurrency() on Linux has no meaning, formerly it was
added because of Solaris in 2001 and it was removed when taskmgr was
refactored to run on top of netmgr in !4918.
(cherry picked from commit 0500345513)
When isc_quota_attach_cb() API returns ISC_R_QUOTA (meaning hard quota
was reached) the accept_connection() would return without logging a
message about quota reached.
Change the connection callback to log the quota reached message.
(cherry picked from commit 2ae84702ad)
Apparently we forgot about DLZ when updating DNS_CLIENTINFO_VERSION
constant for ECS, which is at value "3" since ECS was introduced.
The code in example drivers and tests now hardcodes version numbers
2 (without ECS) and 3 (with ECS) depending on what a given code path
requires.
(cherry picked from commit f81debe1c8)
this brings DNS_CLIENTINFO_VERSION into line with the subscription
branch so that fixes applied to clientinfo processing can also be
applied to the main branch without diverging.
(cherry picked from commit 737e658602)
dns_dlzcreate() fails to free the memory allocated for dlzname
when an error occurs.
Free dlzname's memory (acquired earlier with isc_mem_strdup())
by calling isc_mem_free() before returning an error code.
(cherry picked from commit 4a6c66288f)
When a zone is being configured with a new view, the catalog zones
structure will also be linked to that view. Later on, in case of some
error, should the zone be reverted to the previous view, the link
between the catalog zones structure and the view won't be reverted.
Change the dns_zone_setviewrevert() function so it calls
dns_zone_catz_enable() during a zone revert, which will reset the
link between `catzs` and view.
(cherry picked from commit 2fd967136a)
Separate the locked parts of dns_zone_catz_enable() and
dns_zone_catz_disable() functions into static functions. This will
let us perform those tasks from the other parts of the module while
the zone is locked, avoiding one pair of additional unlocking and
locking operations.
(cherry picked from commit 6b937ed5f6)
The commit itself is harmless, but at the same time it is also useless,
so we are reverting it.
This reverts commit 11c869a3d5.
(cherry picked from commit 0a4e91ee47)
Previously, the netmgr/udp.c tried to detect the recvmmsg detection in
libuv with #ifdef UV_UDP_<foo> preprocessor macros. However, because
the UV_UDP_<foo> are not preprocessor macros, but enum members, the
detection didn't work. Because the detection didn't work, the code
didn't have access to the information when we received the final chunk
of the recvmmsg and tried to free the uvbuf every time. Fortunately,
the isc__nm_free_uvbuf() had a kludge that detected attempt to free in
the middle of the receive buffer, so the code worked.
However, libuv 1.37.0 changed the way the recvmmsg was enabled from
implicit to explicit, and we checked for yet another enum member
presence with preprocessor macro, so in fact libuv recvmmsg support was
never enabled with libuv >= 1.37.0.
This commit changes to the preprocessor macros to autoconf checks for
declaration, so the detection now works again. On top of that, it's now
possible to cleanup the alloc_cb and free_uvbuf functions because now,
the information whether we can or cannot free the buffer is available to
us.
(cherry picked from commit 7370725008)
This commit converts the license handling to adhere to the REUSE
specification. It specifically:
1. Adds used licnses to LICENSES/ directory
2. Add "isc" template for adding the copyright boilerplate
3. Changes all source files to include copyright and SPDX license
header, this includes all the C sources, documentation, zone files,
configuration files. There are notes in the doc/dev/copyrights file
on how to add correct headers to the new files.
4. Handle the rest that can't be modified via .reuse/dep5 file. The
binary (or otherwise unmodifiable) files could have license places
next to them in <foo>.license file, but this would lead to cluttered
repository and most of the files handled in the .reuse/dep5 file are
system test files.
(cherry picked from commit 58bd26b6cf)
The isc__nm_tcp_resumeread() was using maybe_enqueue function to enqueue
netmgr event which could case the read callback to be executed
immediately if there was enough data waiting in the TCP queue.
If such thing would happen, the read callback would be called before the
previous read callback was finished and the worker receive buffer would
be still marked "in use" causing a assertion failure.
This would affect only raw TCP channels, e.g. rndc and http statistics.
(cherry picked from commit 11c869a3d5)
While doing code review, it was found that the taskmgr->exiting is set
under taskmgr->lock, but accessed under taskmgr->excl_lock in the
isc_task_beginexclusive().
Additionally, before the change that moved running the tasks to the
netmgr, the task_ready() subrouting of isc_task_detach() would lock
mgr->lock, requiring the mgr->excl to be protected mgr->excl_lock
to prevent deadlock in the code. After !4918 has been merged, this is
no longer true, and we can remove taskmgr->excl_lock and use
taskmgr->lock in its stead.
Solve both issues by removing the taskmgr->excl_lock and exclusively use
taskmgr->lock to protect both taskmgr->excl and taskmgr->exiting which
now doesn't need to be atomic_bool, because it's always accessed from
within the locked section.
(cherry picked from commit e705f213ca)
The isc_taskmgr_excltask() would return ISC_R_NOTFOUND either when the
exclusive task was not set (yet) or when the taskmgr is shutting down
and the exclusive task has been already cleared.
Distinguish between the two states and return ISC_R_SHUTTINGDOWN when
the taskmgr is being shut down instead of ISC_R_NOTFOUND.
(cherry picked from commit f9d90159b8)
When the signed version of an inline-signed zone is dumped to disk, the
serial number of the unsigned version of the zone is stored in the
raw-format header so that the contents of the signed zone can be
resynchronized after named restart if the unsigned zone file is modified
while named is not running.
In order for the serial number of the unsigned zone to be determined
during the dump, zone->raw must be set to a non-NULL value. This should
always be the case as long as the signed version of the zone is used for
anything by named.
However, a scenario exists in which the signed version of the zone has
zone->raw set to NULL while it is being dumped:
1. Zone dump is requested; zone_dump() is invoked.
2. Another zone dump is already in progress, so the dump gets deferred
until I/O is available (see zonemgr_getio()).
3. The last external reference to the zone is released.
zone_shutdown() gets queued to the zone's task.
4. I/O becomes available for zone dumping. zone_gotwritehandle() gets
queued to the zone's task.
5. The zone's task runs zone_shutdown(). zone->raw gets set to NULL.
6. The zone's task runs zone_gotwritehandle(). zone->raw is determined
to be NULL, causing the serial number of the unsigned version of the
zone to be omitted from the raw-format dump of the signed zone file.
Note that the naïve solution - deferring the dns_zone_detach() call for
zone->raw until zone_free() gets called for the secure version of the
zone - does not work because it leads to a chicken-and-egg problem when
the inline-signed zone is about to get freed: the raw zone holds a weak
reference to the secure zone and that reference does not get released
until the reference count for the raw zone reaches zero, which in turn
would not happen until all weak references to the secure zone were
released.
Defer detaching from zone->raw in zone_shutdown() if the zone is in the
process of being dumped to disk. Ensure zone->raw gets detached from
after the dump is finished if detaching gets deferred. Prevent zone
dumping from being requeued upon failure if the zone is in the process
of being cleaned up as it opens up possibilities for the zone->raw
reference to leak, triggering a shutdown hang.
(cherry picked from commit ef625f5f06)
In some cases we want to keep expired signatures. For example, if the
KSK is offline, we don't want to fall back to signing with the ZSK.
We could remove the signatures, but in any case we end up with a broken
zone.
The change made for GL #763 prevented the behavior to sign the DNSKEY
RRset with the ZSK if the KSK was offline (and signatures were expired).
The change causes the definition of "having both keys": if one key is
offline, we still consider having both keys, so we don't fallback
signing with the ZSK if KSK is offline.
That change also works the other way, if the ZSK is offline, we don't
fallback signing with the KSK.
This commit fixes that, so we only fallback signing zone RRsets with
the KSK, not signing key RRsets with the ZSK.
(cherry picked from commit beeefe35c4)
BIND can log this warning:
zone example.ch/IN (signed): Key example.ch/ECDSAP256SHA256/56340
missing or inactive and has no replacement: retaining signatures.
This log can happen when BIND tries to remove signatures because the
are about to expire or to be resigned. These RRsets may be signed with
the KSK if the ZSK files has been removed from disk. When we have
created a new ZSK we can replace the signatures creeated by the KSK
with signatures from the new ZSK.
It complains about the KSK being missing or inactive, but actually it
takes the key id from the RRSIG.
The warning is logged if BIND detects the private ZSK file is missing.
The warning is logged even if we were able to delete the signature.
With the change from this commit it only logs this warning if it is not
okay to delete the signature.
(cherry picked from commit 2d2858841a)
When the signed version of an inline-signed zone is dumped to disk, the
serial number of the unsigned version of the zone is written in the
raw-format header so that the contents of the signed zone can be
resynchronized after named restart if the unsigned zone file is
modified while named is not running (see RT #26676).
In order for the serial number of the unsigned zone to be determined
during the dump, zone->raw must be set to a non-NULL value. This
should always be the case as long as the signed version of the zone is
used for anything by named.
However, under certain circumstances the zone->raw could be set to NULL
while the zone is being dumped.
Defer detaching from zone->raw in zone_shutdown() if the zone is in the
process of being dumped to disk.
(cherry picked from commit cc1d4e1aa6)
A kasp structure was not detached when looking to see if there
was an existing kasp structure with the same name, causing memory
to be leaked. Fixed by calling dns_kasp_detach() to release the
reference.
(cherry picked from commit 694440e614)
For small sized allocations, the internal allocator gets the memory in
bigger blobs that gets splits into right-sized chunks. This increases
speed of small allocations and reduced the fragmentation, but such
memory is never released back to the operating system.
Disable the internal allocator by default, and add new `-M internal`
command line option to `named`.
Previously, with BIND 9 internal allocator, when isc_mempool_put() would
return memory to the allocator, it would not be freed, but it would be
returned to the "freelists" and the memory would not be released to the
operating system.
Change the isc_mempool_get() and isc_mempool_put() to avoid the internal
allocator (mem_getunlocked() and mem_putunlocked()).
According to the measurements (recorded on GL!5085), the fillcount of 2
for namepool and fillcount of 4 for rdspool can fit 99.99% of request
for tested scenarios.
This was discovered by perf recording the single second recursive test
using flamethrower where the initial malloc lit up like a flare.
Current mempools are kind of hybrid structures - they serve two
purposes:
1. mempool with a lock is basically static sized allocator with
pre-allocated free items
2. mempool without a lock is a doubly-linked list of preallocated items
The first kind of usage could be easily replaced with jemalloc small
sized arena objects and thread-local caches.
The second usage not-so-much and we need to keep this (in
libdns:message.c) for performance reasons.
This prevents a direct leak in OPENSSL_init_crypto (called from
OPENSSL_init_ssl).
Add shim version of OPENSSL_cleanup because it is missing in LibreSSL on
OpenBSD.
(cherry picked from commit 89f4f8f0c8)
On FreeBSD, the pthread primitives are not solely allocated on stack,
but part of the object lives on the heap. Missing pthread_*_destroy
causes the heap memory to grow and in case of fast lived object it's
possible to run out-of-memory.
Properly destroy the leaking mutex (worker->lock) and
the leaking condition (sock->cond).
(cherry picked from commit 57d0fabadd)
Previously, when TCP accept failed, we have logged a message with
ISC_LOG_ERROR level. One common case, how this could happen is that the
client hits TCP client quota and is put on hold and when resumed, the
client has already given up and closed the TCP connection. In such
case, the named would log:
TCP connection failed: socket is not connected
This message was quite confusing because it actually doesn't say that
it's related to the accepting the TCP connection and also it logs
everything on the ISC_LOG_ERROR level.
Change the log message to "Accepting TCP connection failed" and for
specific error states lower the severity of the log message to
ISC_LOG_INFO.
(cherry picked from commit 20ac73eb22)
The following scenario triggers a "named" crash:
1. Configure a catalog zone.
2. Start "named".
3. Comment out the "catalog-zone" clause.
4. Run `rndc reconfig`.
5. Uncomment the "catalog-zone" clause.
6. Run `rndc reconfig` again.
Implement the required cleanup of the in-memory catalog zone during
the first `rndc reconfig`, so that the second `rndc reconfig` could
find it in an expected state.
(cherry picked from commit 43ac2cd229)
The parsing loop needs to process ISC_R_NOSPACE to properly
size the buffer. If result is still ISC_R_NOSPACE at the end
of the parsing loop set result to DNS_R_SERVFAIL.
(cherry picked from commit 08f1cba096)
In file included from rdata.c:602:
In file included from ./code.h:88:
./rdata/in_1/svcb_64.c:259:9: warning: array subscript is of type 'char' [-Wchar-subscripts]
if (!isdigit(*region->base)) {
^~~~~~~~~~~~~~~~~~~~~~
/usr/include/sys/ctype_inline.h:51:44: note: expanded from macro 'isdigit'
#define isdigit(c) ((int)((_ctype_tab_ + 1)[(c)] & _CTYPE_D))
^~~~
(cherry picked from commit d09447287f)
The dst_key_pubcompare() and dst_key_compare() didn't have a unit test,
add the unit tests which test comparing the same keys, different keys,
and, where possible, similar keys with a manually altered parameter.
dst_key_pubcompare() internally uses the *_todns() functions of the
lib/dns/openssl*_link.c modules.
dst_key_compare() internally uses the *_compare() functions of the
lib/dns/openssl*_link.c modules.
When comparing different parameters of two RSA keys there is a typo
which causes the "p" prime factors to not being compared.
Fix the typo.
(cherry picked from commit 930e4f52a5)
Previously, when lame cache would be disabled by setting lame-ttl to 0,
it would also disable lame answer detection. In this commit, we enable
the lame response detection even when the lame cache is disabled. This
enables stopping answer processing early rather than going through the
whole answer processing flow.
After receiving a new version of a catalog zone it is required
to merge it with the old version.
The algorithm walks through the new version's hash table and applies
the following logic:
1. If an entry from the new version does not exist in the old
version, then it's a new entry, add the entry to the `toadd` hash
table.
2. If the zone does not exist in the set of configured zones, because
it was deleted via rndc delzone or it was removed from another
catalog zone instance, then add into to the `toadd` hash table to
be reinstantiated.
3. If an entry from the new version also exists in the old version,
but is modified, then add the entry to the `tomod` hash table, then
remove it from the old version's hash table.
4. If an entry from the new version also exists in the old version and
is the same (unmodified) then just remove it from the old version's
hash table.
The algorithm then deletes all the remaining zones which still exist
in the old version's hash table (because only the ones that don't
exist in the new version should now remain there), then adds the ones
that were added to the `toadd`, and modifies the ones that were added
to the `tomod`, completing the merge.
During a recent refactoring, the part when the entry should be
removed from the old version's hash table on condition (4.) above
was accidentally omitted, so the unmodified zones were remaining
in the old version's hash table and consequently being deleted.
(cherry picked from commit 63145fb1d3)
It was found, that the original commit adding the setmodtime() was
incompletely squashed and there was double check for
DNS_ZONEFLG_NEEDDUMP instead of check for DNS_ZONEFLG_NEEDDUMP and
DNS_ZONEFLG_DUMPING.
Change the duplicate check to DNS_ZONEFLG_DUMPING.
(cherry picked from commit 55ac6b7394)