There's a mismatch between the atomic and non-atomic types that could
potentialy lead to a rwlock deadlock (after two billion 2^32) writes.
Use int_fast32_t when loading the atomic_int_fast32_t types in the
isc_rwlock unit.
When a response times out the fctx_cancelquery() function
incorrectly calculates it in the 'dns_resstatscounter_queryrtt5'
counter (i.e. >=1600 ms). To avoid this, the rctx_timedout()
function should make sure that 'rctx->finish' is NULL. And in order
to adjust the RTT values for the timed out server, 'rctx->no_response'
should be true. Update the rctx_timedout() function to make those
changes.
(cherry picked from commit 830e548111)
The resquery_response() function increases the response counter without
checking if the response was successful. Increase the counter only when
the result indicates success.
(cherry picked from commit 12e7dfa397)
the fix in commit 1edbbc32b4 was incomplete; the wrong
event result could also be set in cache_name() and validated().
(cherry picked from commit 9ebeb60174)
when the caching of a negative record failed because of the
presence of a positive one, ncache_adderesult() could override
this to ISC_R_SUCCESS. this could cause CNAME and DNAME responses
to be handled incorrectly. ncache_adderesult() now sets the result
code correctly in such cases.
(cherry picked from commit 1edbbc32b4)
All DNSKEY keys are able to authenticate. The DNS_KEYTYPE_NOAUTH
(and DNS_KEYTYPE_NOCONF) flags were defined for the KEY rdata type,
and are not applicable to DNSKEY.
Previously, because the DNSKEY implementation was built on top of
KEY, the NOAUTH flag prevented authentication in DNSKEYs as well.
This has been corrected.
(cherry picked from commit 5c21576f82)
Use enums for DNS_KEYFLAG_, DNS_KEYTYPE_, DNS_KEYOWNER_, DNS_KEYALG_,
and DNS_KEYPROTO_ values.
Remove values that are never used.
Eliminate the obsolete DNS_KEYFLAG_SIGNATORYMASK. Instead, add three
more RESERVED bits for the key flag values that it covered but which
were never used.
(cherry picked from commit fee1ba40df)
This commit simplifies code flow in the tls_cycle_input() and makes
the incoming data processing similar to that in TCP DNS. In
particular, now we decipher all the the incoming data before making a
single isc__nm_process_sock_buffer() call. Previously we would try to
decipher data bit-by-bit before trying to process the deciphered bit
via isc__nm_process_sock_buffer(). Doing like before made the code
much less predictable, in particular in the areas like when reading is
paused or resumed.
The newer approach also allowed us to get rid of some old kludges.
When -T cookiealwaysvalid is passed to named, DNS cookie checks for
the incoming queries always pass, given they are structurally correct.
(cherry picked from commit 807ef8545d)
Add missing locks in dns_zone_getxfrsource4 et al. Addresses CID
468706, 468708, 468741, 468742, 468785 and 468778.
Cleanup dns_zone_setxfrsource4 et al to now return void.
Remove double copies with dns_zone_getprimaryaddr and dns_zone_getsourceaddr.
(cherry picked from commit d0a59277fb)
ZONEMD needs to be able to digest SIG and RRSIG records. The signer
field can be compressed in SIG so we need to call dns_name_digest().
While for RRSIG the records the signer field is not compressed the
canonical form has the signer field downcased (RFC 4034, 6.2). This
also implies that compare_rrsig needs to downcase the signer field
during comparison.
(cherry picked from commit 006c5990ce)
Change all the non-locked operations on 'quota->used' and
'quota->waiting' to "acq/rel" for inter-thread synchronization. Some
loads are left as "relaxed", because they are under a locked mutex
which also provides protection.
This commit bumps the total number of active streams (= the opened
streams for which a request is received, but response is not ready) to
60% of the total streams limit.
The previous limit turned out to be too tight as revealed by
longer (≥1h) runs of "stress:long:rpz:doh+udp:linux:*" tests.
(cherry picked from commit eaad0aefe6)
The check, while not active by default, is not valid since the commit
8b8f4d500d.
See 'if (total == 0) { ...' below branch to understand why.
(cherry picked from commit 217a1ebd79)
Previously, the code would try to avoid sending any data regardless of
what it is unless:
a) The flush limit is reached;
b) There are no sends in flight.
This strategy is used to avoid too numerous send requests with little
amount of data. However, it has been proven to be too aggressive and,
in fact, harms performance in some cases (e.g., on longer (≥1h) runs
of "stress:long:rpz:doh+udp:linux:*").
Now, additionally to the listed cases, we also:
c) Flush the buffer and perform a send operation when there is an
outgoing DNS message passed to the code (which is indicated by the
presence of a send callback).
That helps improve performance for "stress:long:rpz:doh+udp:linux:*"
tests.
(cherry picked from commit c5f7968856)
Previously, a function for continuing IO processing on the next UV
tick was introduced (http_do_bio_async()). The intention behind this
function was to ensure that http_do_bio() is eventually called at
least once in the future. However, the current implementation allows
queueing multiple such delayed requests needlessly. There is currently
no need for these excessive requests as http_do_bio() can requeue them
if needed. At the same time, each such request can lead to a memory
allocation, particularly in BIND 9.18.
This commit ensures that the number of enqueued delayed IO processing
requests never exceeds one in order to avoid potentially bombarding IO
threads with the delayed requests needlessly.
(cherry picked from commit 0e1b02868a)
This commit significantly simplifies the code flow in the
http_do_bio() function, which is responsible for processing incoming
and outgoing HTTP/2 data. It seems that the way it was structured
before was indirectly caused by the presence of the missing callback
calls bug, fixed in 8b8f4d500d.
The change introduced by this commit is known to remove a bottleneck
and allows reproducible and measurable performance improvement for
long runs (>= 1h) of "stress:long:rpz:doh+udp:linux:*" tests.
Additionally, it fixes a similar issue with potentially missing send
callback calls processing and hardens the code against use-after-free
errors related to the session object (they can potentially occur).
(cherry picked from commit 0956fb9b9e)
if the NS_QUERY_DONE_BEGIN or NS_QUERY_DONE_SEND hook is
used in a plugin and returns NS_HOOK_RETURN, some of the
cleanup in ns_query_done() can be skipped over, leading
to reference leaks that can cause named to hang on shut
down.
this has been addressed by adding more housekeeping
code after the cleanup: tag in ns_query_done().
(cherry picked from commit c2e4358267)
A change in 6aba56ae8 (checking whether a rejected RRset was identical
to the data it would have replaced, so that we could still cache a
signature) inadvertently introduced cases where processing of a
response would continue when previously it would have been skipped.
(cherry picked from commit d0fd9cbe3b)
After a reconfiguration the old view can be left without a valid
'rpzs' member, because when the RPZ is not changed during the named
reconfiguration 'rpzs' "migrate" from the old view into the new
view, so when a query resumes it can find that 'qctx->view->rpzs'
is NULL which query_resume() currently doesn't expect to happen if
it's recursing and 'qctx->rpz_st' is not NULL.
Fix the issue by adding a NULL-check. In order to not split the log
message to two different log messages depending on whether
'qctx->view->rpzs' is NULL or not, change the message to not log
the RPZ policy's "version" which is just a runtime counter and is
most likely not very useful for the users.
(cherry picked from commit 3ea2fbc238)
Checking whether the authority section is properly signed should
be left to the validator. Checking in getsection (dns_message_parse)
was way too early and resulted in resolution failures of lookups
that should have otherwise succeeded.
(cherry picked from commit 83159d0a54)
Add a new dns_rdataset_equals() function to check whether two
rdatasets are equal in DNSSEC terms.
When an rdataset being cached is rejected because its trust
level is lower than the existing rdataset, we now check to see
whether the rejected data was identical to the existing data.
This allows us to cache a potentially useful RRSIG when handling
CD=1 queries, while still rejecting RRSIGs that would definitely
have resulted in a validation failure.
(cherry picked from commit 6aba56ae89)
The value returned by http_send_outgoing() is not used anywhere, so we
make it not return anything (void). Probably it is an omission from
older times.
(cherry picked from commit 2adabe835a)
When handling outgoing data, there were a couple of rarely executed
code paths that would not take into account that the callback MUST be
called.
It could lead to potential memory leaks and consequent shutdown hangs.
(cherry picked from commit 8b8f4d500d)
This commit changes the way how the number of active HTTP streams is
calculated and allows it to scale with the values of the maximum
amount of streams per connection, instead of effectively capping at
STREAM_CLIENTS_PER_CONN.
The original limit, which is intended to define the pipelining limit
for TCP/DoT. However, it appeared to be too restrictive for DoH, as it
works quite differently and implements pipelining at protocol level by
the means of multiplexing multiple streams. That renders each stream
to be effectively a separate connection from the point of view of the
rest of the codebase.
(cherry picked from commit a22bc2d7d4)
Previously we would limit the amount of incoming data to process based
solely on the presence of not completed send requests. That worked,
however, it was found to severely degrade performance in certain
cases, as was revealed during extended testing.
Now we switch to keeping track of how much data is in flight (or ready
to be in flight) and limit the amount of processed incoming data when
the amount of in flight data surpasses the given threshold, similarly
to like we do in other transports.
(cherry picked from commit 05e8a50818)
If a deferred validation on data that was originally queried with
CD=1 fails, we now repeat the query, since the zone data may have
changed in the meantime.
(cherry picked from commit 04b1484ed8)
When a query is made with CD=1, we store the result in the
cache marked pending so that it can be validated later, at
which time it will either be accepted as an answer or removed
from the cache as invalid. Deferred validation was not
attempted when there were no cached RRSIGs for DNSKEY and
DS. We now complete the deferred validation in this scenario.
(cherry picked from commit 8b900d1808)
Views use two types of reference counting - regular and weak, and
when there are no more regular references, the view_flushanddetach()
function destroys or detaches some parts of the view, including
'view->zonetable', while other parts are freed by destroy() when
the last weak reference is detached. Since catalog zones use weak
references to attach a view, it's currently possible that during
shutdown catalog zone processing will try to add a new zone into
an otherwise unused view (because it's shutting down) which doesn't
have an attached zonetable any more. This could cause an assertion
failure. Fix this issue by modifying the dns_view_addzone() function
to expect that 'view->zonetable' can be NULL, and in that case just
return ISC_R_SHUTTINGDOWN.
prio_type was being used in the wrong place to optimize cname_and_other.
We have to first exclude and accepted types and we also have to
determine that the record exists before we can check if we are at
a point where a later CNAME cannot appear.
(cherry picked from commit 5e49a9e4ae)
In #1870, the expiration time of ANCIENT records were printed, but
actually the ancient records are very short lived, and the information
carries a little value.
Instead of printing the expiration of ANCIENT records, print the
expiration time of STALE records.
When the mark_header_ancient() helper function was introduced, couple of
places with duplicate (or almost duplicate) code was missed. Add
missing set_ttl() calls before mark_header_ancient(), so the handling of
expiring headers is same in all places.
(concept cherry picked from commit 58179e6a19)
If we know that the header has ZEROTTL set, the server should never send
stale records for it and the TTL should never be anything else than 0.
The comment was already there, but the code was not matching the
comment.
(cherry picked from commit cfee6aa565)
When the header has been marked as ANCIENT, but the ttl hasn't been
reset (this happens in couple of places), the rdataset TTL would be
set to the header timestamp instead to a reasonable TTL value.
Since this header has been already expired (ANCIENT is set), set the
rdataset TTL to 0 and don't reuse this field to print the expiration
time when dumping the cache. Instead of printing the time, we now
just print 'expired (awaiting cleanup'.
(cherry picked from commit 1bbb57f81b)
the search for the deepest known zone cut in the cache could
improperly reject a node containing stale data, even if the
NS rdataset wasn't the data that was stale.
this change also improves the efficiency of the search by
stopping it when both NS and RRSIG(NS) have been found.
(cherry picked from commit 1f095b902c)
Change the names of the node reference counting functions
and add comments to make the mechanism easier to understand:
- new_reference() and decrement_references() are now called
dns__rbtnode_acquire() and dns__rbtnode_release()
respectively; this reflects the fact that they modify both
the internal and external reference counters for a node.
- rbtnode_newref() and rbtnode_decref are now called
rbtnode_erefs_increment() and rbtnode_erefs_decrement(),
to reflect that they only increase and decrease the node's
external reference counters, not internal.
(cherry picked from commit 857225aeb6)
Refactor the pattern in the newref() and decref() functions in rbtdb.c
following the pattern, so it follows the similar pattern we already have
for QPDB.
(cherry picked from commit 9c45de9473)
dns_adb_flushname() calls dns_name_hash() to determine the ADB bucket
number to search for the given name. Meanwhile, all other functions in
lib/dns/adb.c call dns_name_fullhash() for determining the bucket number
instead. This discrepancy causes dns_adb_flushname() to have virtually
no chances of actually removing the given name from the ADB if the
name is longer than 16 bytes (since dns_name_hash() only hashes the
first 16 bytes of the name provided to it) - more specifically, the
probability of success for names longer than 16 bytes is inversely
proportional to the number of ADB buckets in use, i.e. 1:1021 at best.
Fix by using dns_name_fullhash() instead of dns_name_hash() in
dns_adb_flushname(), so that the logic for determining the bucket number
that a given name belongs to is consistent throughout lib/dns/adb.c.
When canceling the last fetch, we also need to stop the fctx_expired
timer from possibly firing between the fctx_shutdown() call and the
fetch being actually destroyed along with the timer. As there are
multiple places where fctx_shutdown() is being called without stopping
the timer, move the fctx_stoptimer() to fctx_shutdown() and cleanup the
explicit usage.
Address Database (ADB) shares the memory for the short lived ADB
objects (finds, fetches, addrinfo) and the long lived ADB
objects (names, entries, namehooks). This could lead to a situation
where the resolver-heavy load would force evict ADB objects from the
database to point where ADB is completely empty, leading to even more
resolver-heavy load.
Make the short lived ADB objects use the other memory context that we
already created for the hashmaps. This makes the ADB overmem condition
to not be triggered by the ongoing resolver fetches.
(cherry picked from commit 05faff6d53)
When the recursive-clients value is too large, the linked lists holding
the fetch contexts can also grow large and since the algorithm to merge
outgoing queries is quadratic, named can get slow.
Replace the linked list with hashtable for faster lookups. This also
allows us to reduce the number of tasks (buckets) in the resolver.
Previously, this function always acquires a node write lock if it
might need node cleanup in case the reference decrements to 0. In
fact, the lock is unnecessary if the reference is larger than 1 and it
can be optimized as an "easy" case. This optimization could even be
"necessary". In some extreme cases, many worker threads could repeat
acquring and releasing the reference on the same node, resulting in
severe lock contention for nothing (as the ref wouldn't decrement to 0
in most cases). This change would prevent noticeable performance
drop like query timeout for such cases.
Co-authored-by: JINMEI Tatuya <jtatuya@infoblox.com>
Co-authored-by: Ondřej Surý <ondrej@isc.org>
(cherry picked from commit 7f4471594d)
Currently, the fetch context will continue running even when the last
fetch (response) has been removed from the context, so named can process
and cache the answer. This can lead to a situation where the number of
outgoing recursing clients exceeds the the configured number for
recursive-clients.
Be more stringent about the recursive-clients limit and shutdown the
fetch context immediately after the last fetch has been canceled from
that particular fetch context.
The last remaining tuning value was RESOLVER_NTASKS and instead of
having variable number of the tasks per-cpu and in named and in
dns_client, set the number of the resolver tasks to 523 (number taken
from dns_client unit) to accomodate most of the recursive-clients
values.
The uv_req union member of struct isc__nm_uvreq contained libuv request
types that we don't use. Turns out that uv_getnameinfo_t is 1000 bytes
big and unnecessarily enlarged the whole structure. Remove all the
unused members from the uv_req union.
After removing sockaddr_unix from isc_sockaddr, we can also remove
sockaddr_storage and reduce the isc_sockaddr size from 152 bytes to just
48 bytes needed to hold IPv6 addresses.
(cherry picked from commit 2367b6a2e1)
We started using isc_nm_bad_request() more actively throughout
codebase. In the case of HTTP/2 it can lead to a large count of
useless "Bad Request" messages in the BIND log, as often we attempt to
send such request over effectively finished HTTP/2 sessions.
This commit fixes that.
(cherry picked from commit 937b5f8349)