Replace the isc_mutex in the dns_adb unit with isc_rwlock for better
performance. Both ADB names and ADB entries hashtables and LRU are now
using isc_rwlock.
This changes the internal isc_rwlock implementation to:
Irina Calciu, Dave Dice, Yossi Lev, Victor Luchangco, Virendra
J. Marathe, and Nir Shavit. 2013. NUMA-aware reader-writer locks.
SIGPLAN Not. 48, 8 (August 2013), 157–166.
DOI:https://doi.org/10.1145/2517327.24425
(The full article available from:
http://mcg.cs.tau.ac.il/papers/ppopp2013-rwlocks.pdf)
The implementation is based on the The Writer-Preference Lock (C-RW-WP)
variant (see the 3.4 section of the paper for the rationale).
The implemented algorithm has been modified for simplicity and for usage
patterns in rbtdb.c.
The changes compared to the original algorithm:
* We haven't implemented the cohort locks because that would require a
knowledge of NUMA nodes, instead a simple atomic_bool is used as
synchronization point for writer lock.
* The per-thread reader counters are not being used - this would
require the internal thread id (isc_tid_v) to be always initialized,
even in the utilities; the change has a slight performance penalty,
so we might revisit this change in the future. However, this change
also saves a lot of memory, because cache-line aligned counters were
used, so on 32-core machine, the rwlock would be 4096+ bytes big.
* The readers use a writer_barrier that will raise after a while when
readers lock can't be acquired to prevent readers starvation.
* Separate ingress and egress readers counters queues to reduce both
inter and intra-thread contention.
The dns_badcache was pulling the <isc/atomic.h> header only indirectly
via <isc/rwlock.h>, add the direct include as the <isc/rwlock.h> no
longer pulls the header when pthread_rwlock is used.
I misunderstood the purpose of the `heap_index` rdataset header
member; I thought it identified which heap to use, and could therefore
be smaller, the same size as `locknum` indexes. But in fact it is a
position within a heap, so it needs to be able to count up to the
total number of rdatasets in the rbtdb.
So this changes `heap_index` from `uint16_t` back to `unsigned int`.
To avoid re-embiggening the rdatasetheader, shrink the `count` member
from `uint32` to `uint16`. The `count` is used to rotate RRsets in
`dns_rdataset_towiresorted()`, so 16 bits is more than large enough.
This change also means we no longer need to avoid colliding with
`DNS_RDATASET_COUNT_UNDEFINED` i.e. UINT32_MAX.
Closes#3862
Unfortunately, C still lacks a standard function for pause (x86,
sparc) or yeild (arm) instructions, for use in spin lock or CAS loops.
BIND has its own based on vendor intrinsics or inline asm.
Previously, it was buried in the `isc_rwlock` implementation. This
commit renames `isc_rwlock_pause()` to `isc_pause()` and moves
it into <isc/pause.h>.
This commit also fixes the configure script so that it detects ARM
yield support on systems that identify as `aarch*` instead of `arm*`.
On 64-bit ARM systems we now use the ISB (instruction synchronization
barrier) instruction in preference to yield. The ISB instruction
pauses the CPU for longer, several nanoseconds, which is more like the
x86 pause instruction. There are more details in a Rust pull request,
which also refers to MySQL making the same change:
https://github.com/rust-lang/rust/pull/84725
A dns_rpz_unref_rpzs() call is missing when taking the 'goto unlock;'
path on shutdown, in order to compensate for the earlier
dns_rpz_ref_rpzs() call.
Move the dns_rpz_ref_rpzs() call after the shutdown check.
When there are multiple managed trust anchors we need to know the
name of the trust anchor that is failing. Extend the error message
to include the trust anchor name.
removed some functions that are no longer used and unlikely to
be resurrected, and also some that were only used to support Windows
and can now be replaced with generic versions.
Add magic value to the fctxcount, to check for completely invalid
counters, or counters that have been already destroyed.
Improve the locking around the counters, and because of that we can drop
the atomics and use simple integers - the counters were already locked
and the tiny bits that used the atomics were not worth the extra effort.
- removed documentation of -S option from named man page
- removed documentation of reserved-sockets from ARM
- simplified documentation of dnssec-secure-to-insecure - it
now just says it's obsolete rather than describing what it
doesn't do anymore
- marked three formerly obsolete options as ancient:
parent-registration-delay, reserved-sockets, and
suppress-initial-notify
isc_bind9 was a global bool used to indicate whether the library
was being used internally by BIND or by an external caller. external
use is no longer supported, but the variable was retained for use
by dyndb, which needed it only when being built without libtool.
building without libtool is *also* no longer supported, so the variable
can go away.
libuv support for receiving multiple UDP messages in a single system
call (recvmmsg()) has been tweaked several times between libuv versions
1.35.0 and 1.40.0. Mixing and matching libuv versions within that span
may lead to assertion failures and is therefore considered harmful, so
try to limit potential damage be preventing users from mixing libuv
versions with distinct sets of recvmmsg()-related flags.
The implementation of UDP recvmmsg in libuv 1.35 and 1.36 is
incomplete and could cause assertion failure under certain
circumstances.
Modify the configure and runtime checks to report a fatal error when
trying to compile or run with the affected versions.
Instead of using an extra rarely-used paramater to dns_clientinfo_init()
to set ECS information for a client, this commit adds a function
dns_clientinfo_setecs() which can be called only when ECS is needed.
A recent refactoring in 7e4e125e5e
had introduced a logical error which could result in calling the
dns_resolver_createfetch() function with 'nameservers' pointer set
to NULL, but with 'domain' not set to NULL, which is not allowed
by the function.
Make sure 'domain' is set only when 'nsrdataset' is valid.
the optional 'port' option, when used with notify-source,
transfer-source, etc, is used to set up UDP dispatches with a
particular source port, but when the actual UDP connection was
established the port would be overridden with a random one. this
has been fixed.
(configuring source ports is deprecated in 9.20 and slated for
removal in 9.22, but should still work correctly until then.)
the built-in trust anchors in named and delv are sufficent for
validation. named still needs to be able to load trust anchors from
a bind.keys file for testing purposes, but it doesn't need to be
the default behavior.
we now only load trust anchors from a file if explicitly specified
via the "bindkeys-file" option in named or the "-a" command line
argument to delv. documentation has been cleaned up to remove references
to /etc/bind.keys.
Closes#3850.
it was possible for a managed trust anchor needing to send a key
refresh query to be unable to do so because an authoritative zone
was not yet loaded. this has been corrected by delaying the
synchronization of managed-keys zones until after all zones are
loaded.
* rbt node chains were sized to allow for bitstring labels, so they
had 256 levels; but in the absence of bistrings, 128 is enough.
* dns_byaddr_createptrname() had a redundant options argument,
and a very outdated doc comment.
* A number of comments referred to bitstring labels in a way that is
no longer helpful. (A few informative comments remain.)
ISC_MEM_ZERO requires great care to use when the space returned by
the allocator is larger than the requested space, and when memory is
reallocated. You must ensure that _every_ call to allocate or
reallocate a particular block of memory uses ISC_MEM_ZERO, to ensure
that the extra space is zeroed as expected. (When ISC_MEMFLAG_FILL
is set, the extra space will definitely be non-zero.)
When BIND is built without jemalloc, ISC_MEM_ZERO is implemented in
`jemalloc_shim.h`. This had a bug on systems that have malloc_size()
or malloc_usable_size(): memory was only zeroed up to the requested
size, not the allocated size. When an oversized allocation was
returned, and subsequently reallocated larger, memory between the
original requested size and the original allocated size could
contain unexpected nonzero junk. The realloc call does not know the
original requested size and only zeroes from the original allocated
size onwards.
After this change, `jemalloc_shim.h` always zeroes up to the
allocated size, not the requested size.
Commit 7695c36a5d added a new parameter,
'options', to the prototype of the 'allrdatasets' function pointer in
struct dns_dbmethods. Handle this new parameter accordingly in
rpsdb_allrdatasets().
the rate limter now uses loop callbacks rather than task events.
the API for isc_ratelimiter_enqueue() has been changed; we now pass
in a loop, a callback function and a callback argument, and
receive back a rate limiter event object (isc_rlevent_t). it
is no longer necessary for the caller to allocate the event.
the callback argument needs to include a pointer to the rlevent
object so that it can be freed using isc_rlevent_free(), or by
dequeueing.
The ADB hashmaps are stored in extra memory contexts, so the hash
tables are excluded from the overmem accounting. The new memory
context was unnamed, give it a proper name.
Same thing has happened with extra memory context used for named
global log context - give the extra memory context a proper name.
Set the DS state after issuing 'rndc dnssec -checkds'. If the DS
was published, it should go in RUMOURED state, regardless whether it
is already safe to do so according to the state machine.
Leaving it in HIDDEN (or if it was magically already in OMNIPRESENT or
UNRETENTIVE) would allow for easy shoot in the foot situations.
Similar, if the DS was withdrawn, the state should be set to
UNRETENTIVE. Leaving it in OMNIPRESENT (or RUMOURED/HIDDEN)
would also allow for easy shoot in the foot situations.
Add check for extracting the public 'n' component on OpenSSL 3.0
path. This is mandatory component, and it's presence is checked
already on the other code path.
Also document the reason why private key component getting errors
are ignored.
This restores the Malloced memory counter and it's now always equal to
InUse counter. This is only for backwards compatibility reason and
there is no separate counter.
The commit also cleanups little things like structure with a single
item (summary.inuse), and shuts up a wrong cppcheck warning (the
notorious NULL check after assignment).
Instead of enforcing stronger synchronization between threads, make all
the atomic operations relaxed. We are not really interested in exact
numbers at all times - the single place where we need the exact number
is when the memory context is being destroyed. Even when there's a
overmem counter, we don't care about exact ordering or exact number.
The Lost memory counter would count the memory "lost" by external
libraries. There's really no such thing as `named` require the memory
contexts to be clean on destroy.
The stats buckets were again more useful for internal allocator, because
we would see the individual "block" caches where the allocations would
fall into. Remove the stats buckets, and if needed, we can pull more
detailed statistics out of the jemalloc.
The total memory counter had again little or no meaning when we removed
the internal memory allocator. It was just a monotonic counter that
would count add the allocation sizes but never subtracted anything, so
it would be just a "big number".
The maxinuse memory counter indicated the highest amount of
memory allocated in the past. Checking and updating this high-
water mark value every time memory was allocated had an impact
on server performance, so it has been removed. Memory size can
be monitored more efficiently via an external tool logging RSS.
The malloced and maxmalloced memory counters were mostly useless since
we removed the internal allocator blocks - it would only differ from
inuse by the memory context size itself.
Both receive_secure_serial() and setnsec3param() run on the same zone
loop, therefore they are serialized. Remove the mechanism to enqueue
the nsec3param and secure serial updates in case one of them is
running (as they can not) and replace it with sanity check.