Previously, a data race could cause a newly created fetch context for a new client to be used
before it had been fully initialized, which would cause the query to become stuck; queries for the same
data would be either paused indefinitely or dropped because of
the `clients-per-query` limit. This has been fixed.
Closes#5053
Merge branch '5053-fetch-context-create-data-race' into 'main'
See merge request isc-projects/bind9!10146
The order of the fetch context hash table rwlock and the individual
fetch context was reversed when calling the release_fctx() function.
This was causing a problem when iterating the hash table, and thus the
ordering has been corrected in a way that the hash table rwlock is now
always locked on the outside and the fctx lock is the interior lock.
In the next commit, we need to know whether the timer has been started
or stopped. Add isc_timer_running() function that returns true if the
timer has been started.
Update Sphinx-related Python packages to their current versions pulled
in by "pip install sphinx-rtd-theme" run in a fresh virtual environment.
Merge branch 'michal/update-sphinx-and-sphinx_rtd_theme' into 'main'
See merge request isc-projects/bind9!10138
With RPZ in use, `named` could terminate unexpectedly because of a race condition when a reconfiguration command was received using `rndc`. This has been fixed.
Closes#5146
Merge branch '5146-rpz-reconfig-bug-fix' into 'main'
See merge request isc-projects/bind9!10079
After a reconfiguration the old view can be left without a valid
'rpzs' member, because when the RPZ is not changed during the named
reconfiguration 'rpzs' "migrate" from the old view into the new
view, so when a query resumes it can find that 'qctx->view->rpzs'
is NULL which query_resume() currently doesn't expect to happen if
it's recursing and 'qctx->rpz_st' is not NULL.
Fix the issue by adding a NULL-check. In order to not split the log
message to two different log messages depending on whether
'qctx->view->rpzs' is NULL or not, change the message to not log
the RPZ policy's "version" which is just a runtime counter and is
most likely not very useful for the users.
The isc_counter_create() doesn't need the return value (it was always
ISC_R_SUCCESS), use the macros to implement the reference counting,
little style cleanup, and expand the unit test.
Merge branch 'ondrej/cleanup-isc_counter-unit' into 'main'
See merge request isc-projects/bind9!10126
The isc_counter_create() doesn't need the return value (it was always
ISC_R_SUCCESS), use the macros to implement the reference counting,
little style cleanup, and expand the unit test.
Previously, when parsing responses, named incorrectly rejected responses without matching RRSIG records for NSEC/DS/NSEC3 records in the authority section. This rejection, if appropriate, should have been left for the validator to determine and has been fixed.
Closes#5185
Merge branch '5185-remove-rrsig-check-from-dns_message_parse' into 'main'
See merge request isc-projects/bind9!10125
Checking whether the authority section is properly signed should
be left to the validator. Checking in getsection (dns_message_parse)
was way too early and resulted in resolution failures of lookups
that should have otherwise succeeded.
Previously a hard-coded limitation of maximum two key or message
verification checks were introduced when checking the message's
SIG(0) signature. It was done in order to protect against possible
DoS attacks. The logic behind choosing the number 2 was that more
than a single key should only be required during key rotations, and
in that case two keys are enough. But later it became apparent that
there are other use cases too where even more keys are required, see
issue number #5050 in GitLab.
This change introduces two new configuration options for the views,
`sig0key-checks-limit` and `sig0message-checks-limit`, which define how
many keys are allowed to be checked to find a matching key, and how
many message verifications are allowed to take place once a matching
key has been found. The latter protects against expensive cryptographic
operations when there are keys with colliding tags and algorithm
numbers, with default being 2, and the former protects against a bit
less expensive key parsing operations and defaults to 16.
Closes#5050
Merge branch '5050-sig0-let-considering-more-than-two-keys' into 'main'
See merge request isc-projects/bind9!9967
Previously a hard-coded limitation of maximum two key or message
verification checks were introduced when checking the message's
SIG(0) signature. It was done in order to protect against possible
DoS attacks. The logic behind choosing the number two was that more
than one key should only be required only during key rotations, and
in that case two keys are enough. But later it became apparent that
there are other use cases too where even more keys are required, see
issue number #5050 in GitLab.
This change introduces two new configuration options for the views,
sig0key-checks-limit and sig0message-checks-limit, which define how
many keys are allowed to be checked to find a matching key, and how
many message verifications are allowed to take place once a matching
key has been found. The latter protects against expensive cryptographic
operations when there are keys with colliding tags and algorithm
numbers, with default being 2, and the former protects against a bit
less expensive key parsing operations and defaults to 16.
Running jobs which were entered into the isc_quota queue is the
responsibility of the isc_quota_release() function, which, when
releasing a previously acquired quota, checks whether the queue
is empty, and if it's not, it runs a job from the queue without touching
the 'quota->used' counter. This mechanism is susceptible to a possible
hangup of a newly queued job in case when between the time a decision
has been made to queue it (because used >= max) and the time it was
actually queued, the last quota was released. Since there is no more
quotas to be released (unless arriving in the future), the newly
entered job will be stuck in the queue.
Fix the issue by adding checks in both isc_quota_release() and
isc_quota_acquire_cb() to make sure that the described hangup does
not happen. Also see code comments.
Closes#4965
Merge branch '4965-isc_quota-bug-fix' into 'main'
See merge request isc-projects/bind9!10082
Running jobs which were entered into the isc_quota queue is the
responsibility of the isc_quota_release() function, which, when
releasing a previously acquired quota, checks whether the queue
is empty, and if it's not, it runs a job from the queue without touching
the 'quota->used' counter. This mechanism is susceptible to a possible
hangup of a newly queued job in case when between the time a decision
has been made to queue it (because used >= max) and the time it was
actually queued, the last quota was released. Since there is no more
quotas to be released (unless arriving in the future), the newly
entered job will be stuck in the queue.
Fix the wrong memory ordering for 'quota->used', as the relaxed
ordering doesn't ensure that data modifications made by one thread
are visible in other threads.
Add checks in both isc_quota_release() and isc_quota_acquire_cb()
to make sure that the described hangup does not happen. Also see
code comments.
A new option 'min-transfer-rate-in <bytes> <minutes>' has been added
to the view and zone configurations. It can abort incoming zone
transfers which run very slowly due to network related issues, for
example. The default value is set to 10240 bytes in 5 minutes.
Closes#3914
Merge branch '3914-detect-and-restart-stalled-zone-transfers' into 'main'
See merge request isc-projects/bind9!9098
Expose the average transfer rate (in bytes-per-second) during the
last full 'min-transfer-rate-in <bytes> <minutes>' minutes interval.
If no such interval has passed yet, then the overall average rate is
reported instead.
Add a new big zone, run a zone transfer in slow mode, and check
whether the zone transfer gets canceled because 100000 bytes are
not transferred in 5 seconds (as it's running in slow mode).
This new option sets a minimum amount of transfer rate for
an incoming zone transfer that will abort a transfer, which
for some network related reasons run very slowly.
The cache has been updated so that if new data is rejected - for example, because there was already existing data at a higher trust level - then its covering RRSIG will also be rejected.
Closes#5132
Merge branch '5132-improve-cd-behavior' into 'main'
See merge request isc-projects/bind9!9999
Add a new dns_rdataset_equals() function to check whether two
rdatasets are equal in DNSSEC terms.
When an rdataset being cached is rejected because its trust
level is lower than the existing rdataset, we now check to see
whether the rejected data was identical to the existing data.
This allows us to cache a potentially useful RRSIG when handling
CD=1 queries, while still rejecting RRSIGs that would definitely
have resulted in a validation failure.
Rdata slabs used in the QP databases are usually prepended with a slab header, but are sometimes "raw", containing only the rdata and no header. Previously, to allow for them to be used both ways, functions that operated on them took a `reservelen` argument, which would be set to either the header length or to zero, and skipped over that many bytes at the beginning of the buffer. Most such functions were never used on the raw form. To make the code clearer, each of these functions now operates on full slabs with headers, and an alternate "raw" version of the function has been added in cases where that was needed.
In addition, the `dns_rdataslab_merge()` and `_subtract()` functions have been rewritten for clarity and efficiency, and a minor bug has been fixed in `dns_rdataslab_equal()` and `_equalx()`, which could cause an incorrect result if both slabs being compared had zero length.
Merge branch 'each-refactor-rdataslab' into 'main'
See merge request isc-projects/bind9!10084
The "raw" version of the header was used for the noqname and the closest
proofs to save around 152 bytes of the dns_slabheader_t while bringing
an additional complexity. Remove the raw version of the dns_slabheader
API at the slight expense of having unused dns_slabheader_t data sitting
in front of the proofs.
reduce the number of rdata comparisons needed by walking
through the original slab once to determine whether the rdata
in it is duplicated in the slab to be subtracted, and then
write out the rdatas that aren't. previously, this was
done twice: once when determining the size of the target buffer
and then again when copying data into it.
when merging two rdata slabs, we now check once to see
whether an item in the new slab has a duplicate in the
old. previously this was done twice; once to determine the
size of the target buffer required, and then again when
copying the data into it.
we also minimize the number of rdata comparisons necessary,
by remembering which items in the old slab have already been
found to be duplicates.
The function name dns_slabheader_fromrdataset() was too similar
to dns_rdataslab_fromrdataset(). Instead, we now have an rdataset
method 'getheader' which is implemented for slab-type rdatasets.
A new NOHEADER rdataset attribute is set for rdatasets using
raw slabs (i.e., noqname and closest encloser proofs); when
called on rdatasets with that flag set, dns_rdataset_getheader()
returns NULL.
when dns_rdataslab_fromrdataset() is run, in addition to
allocating space for a slab header, it also partially
initializes it, setting the type match rdataset->type and
rdataset->covers, the trust to rdataset->trust, and the TTL to
rdataset->ttl.
there are now two functions for creating an rdataslab from an
rdataset: dns_rdataslab_fromrdataset() creates a full slab (including
space for a slab header), and dns_rdataslab_raw_fromrdataset() creates
a raw slab.
- there are now two functions for getting rdataslab size:
dns_rdataslab_size() is for full slabs and dns_rdataslab_sizeraw()
for raw slabs. there is no longer a need for a reservelen parameter.
- dns_rdataslab_count() also no longer takes a reservelen parameter.
(currently it's never used for raw slabs, so there is no _countraw()
function.)
- dns_rdataslab_rdatasize() has been removed, because
dns_rdataslab_sizeraw() can do the same thing.
- dns_rdataslab_merge() and dns_rdataslab_subtract() both take
slabheader parameters instead of character buffers, and the
reservelen parameter has been removed.
if both rdataslabs being compared have zero length, return true.
also, since these functions are only ever called on slabheaders
with sizeof(dns_slabheader_t) as the reserve length, we can
simplify the API: remove the reservelen argument, and pass the
slabs as type dns_slabheader_t * instead of unsigned char *.
The dns_slabheader object uses the 'next' pointer for two purposes.
In the first header for any given type, 'next' points to the first
header for the next type. But 'down' points to the next header of
the same type, and in that record, 'next' points back up.
This design made the code confusing to read. We now use a union
so that the 'next' pointer can also be called 'up'.
The value returned by http_send_outgoing() is not used anywhere, so we
make it not return anything (void). Probably it is an omission from
older times.
When handling outgoing data, there were a couple of rarely executed
code paths that would not take into account that the callback MUST be
called.
It could lead to potential memory leaks and consequent shutdown hangs.
This commit changes the way how the number of active HTTP streams is
calculated and allows it to scale with the values of the maximum
amount of streams per connection, instead of effectively capping at
STREAM_CLIENTS_PER_CONN.
The original limit, which is intended to define the pipelining limit
for TCP/DoT. However, it appeared to be too restrictive for DoH, as it
works quite differently and implements pipelining at protocol level by
the means of multiplexing multiple streams. That renders each stream
to be effectively a separate connection from the point of view of the
rest of the codebase.
Previously we would limit the amount of incoming data to process based
solely on the presence of not completed send requests. That worked,
however, it was found to severely degrade performance in certain
cases, as was revealed during extended testing.
Now we switch to keeping track of how much data is in flight (or ready
to be in flight) and limit the amount of processed incoming data when
the amount of in flight data surpasses the given threshold, similarly
to like we do in other transports.
In the qpzone implementation of `dns_db_closeversion()`, if there are changed nodes that have no remaining data, delete them.
Closes#5169
Merge branch '5169-qpzone-delete-dead-nodes' into 'main'
See merge request isc-projects/bind9!10089
Removed some code from the cache database implementation that was left over from before it and the zone database implementation were separated.
Merge branch 'each-qpcache-refactor' into 'main'
See merge request isc-projects/bind9!9991
Database versions are not used in cache databases. Some places in
qpcache.c required the version argument to be NULL; others marked it
as UNUSED. Unify all cases to require version to be NULL.
Add new related_headers() function that simplifies the code
flow in qpcache_findrdataset(). Also use check_stale_header() function
to remove code duplication.
Add a helper function both_headers() that unifies the slabheader
matching for simple type: it returns true when both the type and
the matching RRSIG have been found.