Commit graph

13285 commits

Author SHA1 Message Date
Ondřej Surý
ec86759401 Replace netmgr per-protocol sequential function with a common one
Previously, each protocol (TCPDNS, TLSDNS) has specified own function to
disable pipelining on the connection.  An oversight would lead to
assertion failure when opcode is not query over non-TCPDNS protocol
because the isc_nm_tcpdns_sequential() function would be called over
non-TCPDNS socket.  This commit removes the per-protocol functions and
refactors the code to have and use common isc_nm_sequential() function
that would either disable the pipelining on the socket or would handle
the request in per specific manner.  Currently it ignores the call for
HTTP sockets and causes assertion failure for protocols where it doesn't
make sense to call the function at all.
2021-06-22 17:21:44 +03:00
Matthijs Mekking
42c601ae14 Fix deadlock issue with key-directory and in-view
When locking key files for a zone, we iterate over all the views and
lock a mutex inside the zone structure. However, if we envounter an
in-view zone, we will try to lock the key files twice, one time for
the home view and one time for the in-view view. This will lead to
a deadlock because one thread is trying to get the same lock twice.
2021-06-22 09:04:11 +02:00
Mark Andrews
d1e283ede1 Checking of key-directory and dnssec-policy was broken
the checks failed to account for key-directory being inheritable.
2021-06-18 16:46:02 +10:00
Mark Andrews
08eeebb6a7 Add w and W to maptoupper and maptolower tables 2021-06-18 11:29:45 +10:00
Michał Kępień
6b77583f54 Allow resetting hash table size limits for DNS DBs
When "max-cache-size" is changed to "unlimited" (or "0") for a running
named instance (using "rndc reconfig"), the hash table size limit for
each affected cache DB is not reset to the maximum possible value,
preventing those hash tables from being allowed to grow as a result of
new nodes being added.

Extend dns_rbt_adjusthashsize() to interpret "size" set to 0 as a signal
to remove any previously imposed limits on the hash table size.  Adjust
API documentation for dns_db_adjusthashsize() accordingly.  Move the
call to dns_db_adjusthashsize() from dns_cache_setcachesize() so that it
also happens when "size" is set to 0.
2021-06-17 17:09:37 +02:00
Michał Kępień
c096f91451 Allow hash tables for cache RBTs to be grown
Upon creation, each dns_rbt_t structure has its "maxhashbits" field
initialized to the value of the RBT_HASH_MAX_BITS preprocessor macro,
i.e. 32.  When the dns_rbt_adjusthashsize() function is called for the
first time for a given RBT (for cache RBTs, this happens when they are
first created, i.e. upon named startup), it lowers the value of the
"maxhashbits" field to the number of bits required to index the
requested number of hash table slots.  When a larger hash table size is
subsequently requested, the value of the "maxhashbits" field should be
increased accordingly, up to RBT_HASH_MAX_BITS.  However, the loop in
the rehash_bits() function currently ensures that the number of bits
necessary to index the resized hash table will not be larger than
rbt->maxhashbits instead of RBT_HASH_MAX_BITS, preventing the hash table
from being grown once the "maxhashbits" field of a given dns_rbt_t
structure is set to any value lower than RBT_HASH_MAX_BITS.

Fix by tweaking the loop guard condition in the rehash_bits() function
so that it compares the new number of bits used for indexing the hash
table against RBT_HASH_MAX_BITS rather than rbt->maxhashbits.
2021-06-17 17:09:37 +02:00
Ondřej Surý
54c389dbc0 Drop support for clang atomic and gcc __sync builtins
The requirements for BIND 9.17+ now requires C11 support from the
compiler, so we can safely drop most of the stdatomic.h shims from
lib/isc/unix/include/stdatomic.h.

This commit removes support for clang atomic builtins (clang >= 3.6.0
includes stdatomic.h header) and for Gcc __sync builtins.

The only compatibility shim that remains is support for __atomic
builtins for Gcc >= 4.7.0 since CentOS 7 still includes only Gcc 4.8.1
and the proper stdatomic.h header was only introduced in Gcc >= 4.9.
2021-06-17 09:51:04 +02:00
Ondřej Surý
4677bb28d1 Remove atomics emulated by a mutex-locked variable
Mutex atomics were intended to be used as a debugging tool only
and it has already served its purpose and it's not needed anymore.
2021-06-17 09:51:04 +02:00
Artem Boldariev
dc356bb196 Fix ASAN error in DoH (passing NULL to memmove())
The warning was produced by an ASAN build:

runtime error: null pointer passed as argument 2, which is declared to
never be null

This commit fixes it by checking if nghttp2_session_mem_send() has
actually returned anything.
2021-06-16 17:46:10 +03:00
Mark Andrews
234ad2d075 Lock access to task->threadid 2021-06-15 00:01:58 +00:00
Artem Boldariev
ccd2267b1c Set sock->iface and sock->peer properly for layered connection types
This change sets the mentioned fields properly and gets rid of klusges
added in the times when we were keeping pointers to isc_sockaddr_t
instead of copies. Among other things it helps to avoid a situation
when garbage instead of an address appears in dig output.
2021-06-14 11:37:36 +03:00
Artem Boldariev
b84fa122ce Make BIND refuse to serve XFRs over DoH
We cannot use DoH for zone transfers.  According to RFC8484 a DoH
request contains exactly one DNS message (see Section 6: Definition of
the "application/dns-message" Media Type,
https://datatracker.ietf.org/doc/html/rfc8484#section-6).  This makes
DoH unsuitable for zone transfers as often (and usually!) these need
more than one DNS message, especially for larger zones.

As zone transfers over DoH are not (yet) standardised, nor discussed
in RFC8484, the best thing we can do is to return "not implemented."

Technically DoH can be used to transfer small zones which fit in one
message, but that is not enough for the generic case.

Also, this commit makes the server-side DoH code ensure that no
multiple responses could be attempted to be sent over one HTTP/2
stream. In HTTP/2 one stream is mapped to one request/response
transaction. Now the write callback will be called with failure error
code in such a case.
2021-06-14 11:37:36 +03:00
Artem Boldariev
009752cab0 Pass an HTTP handle to the read callback when finishing a stream
This commit fixes a leftover from an earlier version of the client-side
DoH code when the underlying transport handle was used directly.
2021-06-14 11:37:36 +03:00
Artem Boldariev
d5d20cebb2 Fix a crash in the client-side DoH code (header processing callback)
Support a situation in header processing callback when client side
code could receive a belated response or part of it. That could
happen when the HTTP/2 session was already closed, but there were some
response data from server in flight. Other client-side nghttp2
callbacks code already handled this case.

The bug became apparent after HTTP/2 write buffering was supported,
leading to rare unit test failures.
2021-06-14 11:37:33 +03:00
Artem Boldariev
2dfc0d9afc Nullify connect.cstream in time and keep track of all client streams
This commit ensures that sock->h2.connect.cstream gets nullified when
the object in question is deleted. This fixes a nasty crash in dig
exposed when receiving large responses leading to double free()ing.

Also, it refactors how the client-side code keeps track of client
streams (hopefully) preventing from similar errors appearing in the
future.
2021-06-14 11:37:29 +03:00
Artem Boldariev
5b507c1136 Fix BIND to serve large HTTP responses
This commit makes NM code to report HTTP as a stream protocol. This
makes it possible to handle large responses properly. Like:

dig +https @127.0.0.1 A cmts1-dhcp.longlines.com
2021-06-14 11:37:17 +03:00
Ondřej Surý
b3de93e54c Update the source code formatting using clang-format-12
clang-format now tries to keep the type-cast on the same line as the
variable.  Update the formatting.
2021-06-13 08:46:28 +02:00
Michał Kępień
7a87bf468b Fix "no DS" proofs for wildcard+CNAME delegations
When answering a query requires wildcard expansion, the AUTHORITY
section of the response needs to include NSEC(3) record(s) proving that
the QNAME does not exist.

When a response to a query is an insecure delegation, the AUTHORITY
section needs to include an NSEC(3) proof that no DS record exists at
the parent side of the zone cut.

These two conditions combined trip up the NSEC part of the logic
contained in query_addds(), which expects the NS RRset to be owned by
the first name found in the AUTHORITY section of a delegation response.
This may not always be true, for example if wildcard expansion causes an
NSEC record proving QNAME nonexistence to be added to the AUTHORITY
section before the delegation is added to the response.  In such a case,
named incorrectly omits the NSEC record proving nonexistence of QNAME
from the AUTHORITY section.

The same block of code is affected by another flaw: if the same NSEC
record proves nonexistence of both the QNAME and the DS record at the
parent side of the zone cut, this NSEC record will be added to the
AUTHORITY section twice.

Fix by looking for the NS RRset in the entire AUTHORITY section and
adding the NSEC record to the delegation using query_addrrset() (which
handles duplicate RRset detection).
2021-06-10 10:13:23 +02:00
Mark Andrews
098639dc59 Fix the variable checked by a post-load assertion
Instead of checking the value of the variable modified two lines earlier
(the number of SOA records present at the apex of the old version of the
zone), one of the RUNTIME_CHECK() assertions in zone_postload() checks
the number of SOA records present at the apex of the new version of the
zone, which is already checked before.  Fix the assertion by making it
check the correct variable.
2021-06-10 10:01:34 +02:00
Mark Andrews
2bc454dc2d Adjust acceptable count values
usleep(100000) can be slightly less than 10ms so allow the count
to reach 11.
2021-06-09 22:05:55 +00:00
Mark Andrews
3d66e97a28 Address race between zone_settimer and set_key_expiry_warning by
adding missing lock.

    WARNING: ThreadSanitizer: data race
    Read of size 4 at 0x000000000001 by thread T1 (mutexes: read M1, write M2):
    #0 isc_time_isepoch lib/isc/unix/time.c:110
    #1 zone_settimer lib/dns/zone.c:14649
    #2 dns_zone_maintenance lib/dns/zone.c:6281
    #3 dns_zonemgr_forcemaint lib/dns/zone.c:18190
    #4 view_loaded server.c:9654
    #5 call_loaddone lib/dns/zt.c:301
    #6 doneloading lib/dns/zt.c:575
    #7 zone_asyncload lib/dns/zone.c:2259
    #8 task_run lib/isc/task.c:845
    #9 isc_task_run lib/isc/task.c:938
    #10 isc__nm_async_task lib/isc/netmgr/netmgr.c:855
    #11 process_netievent lib/isc/netmgr/netmgr.c:934
    #12 process_queue lib/isc/netmgr/netmgr.c:1003
    #13 process_all_queues lib/isc/netmgr/netmgr.c:775
    #14 async_cb lib/isc/netmgr/netmgr.c:804
    #15 <null> <null>
    #16 isc__trampoline_run lib/isc/trampoline.c:191
    #17 <null> <null>

    Previous write of size 4 at 0x000000000001 by thread T2:
    #0 isc_time_set lib/isc/unix/time.c:93
    #1 set_key_expiry_warning lib/dns/zone.c:6430
    #2 del_sigs lib/dns/zone.c:6711
    #3 zone_resigninc lib/dns/zone.c:7113
    #4 zone_maintenance lib/dns/zone.c:11111
    #5 zone_timer lib/dns/zone.c:14588
    #6 task_run lib/isc/task.c:845
    #7 isc_task_run lib/isc/task.c:938
    #8 isc__nm_async_task lib/isc/netmgr/netmgr.c:855
    #9 process_netievent lib/isc/netmgr/netmgr.c:934
    #10 process_queue lib/isc/netmgr/netmgr.c:1003
    #11 process_all_queues lib/isc/netmgr/netmgr.c:775
    #12 async_cb lib/isc/netmgr/netmgr.c:804
    #13 <null> <null>
    #14 isc__trampoline_run lib/isc/trampoline.c:191
    #15 <null> <null>

    SUMMARY: ThreadSanitizer: data race lib/isc/unix/time.c:110 in isc_time_isepoch
2021-06-09 13:31:05 +00:00
Ondřej Surý
440fb3d225 Completely remove BIND 9 Windows support
The Windows support has been completely removed from the source tree
and BIND 9 now no longer supports native compilation on Windows.

We might consider reviewing mingw-w64 port if contributed by external
party, but no development efforts will be put into making BIND 9 compile
and run on Windows again.
2021-06-09 14:35:14 +02:00
Matthijs Mekking
0ae3ffdc1c Fix NSEC3 resalting upon restart
When named restarts, it will examine signed zones and checks if the
current denial of existence strategy matches the dnssec-policy. If not,
it will schedule to create a new NSEC(3) chain.

However, on startup the zone database may not be read yet, fooling
BIND that the denial of existence chain needs to be created. This
results in a replacement of the previous NSEC(3) chain.

Change the code such that if the NSEC3PARAM lookup failed (the result
did not return in ISC_R_SUCCESS or ISC_R_NOTFOUND), we will try
again later. The nsec3param structure has additional variables to
signal if the lookup is postponed. We also need to save the signal
if an explicit resalt was requested.

In addition to the two added boolean variables, we add a variable to
store the NSEC3PARAM rdata. This may have a yet to be determined salt
value. We can't create the private data yet because there may be a
mismatch in salt length and the NULL salt value.
2021-06-09 09:14:09 +02:00
Ondřej Surý
7e59b8a4a1 Pause the dbiterator when dumping the zone to the disk
When we rewrote the zone dumping to use the separate threadpool, the
dumping would acquire the read lock for the whole time the zone dumping
process is dumping the zone.

When combined with incoming IXFR that tries to acquire the write lock on
the same rwlock, we would end up blocking all the other readers.

In this commit, we pause the dbiterator every time we get next record
and before start dumping it to the disk.
2021-06-04 08:25:05 +00:00
Mark Andrews
66d1df57cb Report which assertion failed when calling set_global_error 2021-06-03 11:55:31 +10:00
Ondřej Surý
f14d870d15 Fix copy&paste error in setsockopt_off
Because of copy&paste error the setsockopt_off macro would enable
the socket option instead of disabling it.
2021-06-02 17:47:14 +02:00
Ondřej Surý
67afea6cfc Cleanup the remaining of HAVE_UV_<func> macros
While cleaning up the usage of HAVE_UV_<func> macros, we forgot to
cleanup the HAVE_UV_UDP_CONNECT in the actual code and
HAVE_UV_TRANSLATE_SYS_ERROR and this was causing Windows build to fail
on uv_udp_send() because the socket was already connected and we were
falsely assuming that it was not.

The platforms with autoconf support were not affected, because we were
still checking for the functions from the configure.
2021-06-02 11:23:36 +02:00
Artem Boldariev
35d0027f36 HTTP/2 write buffering
This commit adds the ability to consolidate HTTP/2 write requests if
there is already one in flight. If it is the case, the code will
consolidate multiple subsequent write request into a larger one
allowing to utilise the network in a more efficient way by creating
larger TCP packets as well as by reducing TLS records overhead (by
creating large TLS records instead of multiple small ones).

This optimisation is especially efficient for clients, creating many
concurrent HTTP/2 streams over a transport connection at once.  This
way, the code might create a small amount of multi-kilobyte requests
instead of many 50-120 byte ones.

In fact, it turned out to work so well that I had to add a work-around
to the code to ensure compatibility with the flamethrower, which, at
the time of writing, does not support TLS records larger than two
kilobytes. Now the code tries to flush the write buffer after 1.5
kilobyte, which is still pretty adequate for our use case.

Essentially, this commit implements a recommendation given by nghttp2
library:

https://nghttp2.org/documentation/nghttp2_session_mem_send.html
2021-06-01 21:07:45 +03:00
Ondřej Surý
e83b6569da Indicate to the kernel that we won't be needing the zone dumps
Add a call to posix_fadvise() to indicate to the kernel, that `named`
won't be needing the dumped zone files any time soon with:

 * POSIX_FADV_DONTNEED - The specified data will not be accessed in the
   near future.

Notes:

 POSIX_FADV_DONTNEED attempts to free cached pages associated with the
 specified region. This is useful, for example, while streaming large
 files. A program may periodically request the kernel to free cached
 data that has already been used, so that more useful cached pages are
 not discarded instead.
2021-05-31 14:52:05 +02:00
Ondřej Surý
8a5c62de83 Refactor zone dumping code to use netmgr async threadpools
Previously, dumping the zones to the files were quantized, so it doesn't
slow down network IO processing.  With the introduction of network
manager asynchronous threadpools, we can move the IO intensive work to
use that API and we don't have to quantize the work anymore as it the
file IO won't block anything except other zone dumping processes.
2021-05-31 14:52:05 +02:00
Ondřej Surý
7670f98377 Add isc_task_getnetmgr() function
Add a function to pull the attached netmgr from inside the executed
task.  This is needed for any task that needs to call the netmgr API.
2021-05-31 14:52:05 +02:00
Ondřej Surý
87fe97ed91 Add asynchronous work API to the network manager
The libuv has a support for running long running tasks in the dedicated
threadpools, so it doesn't affect networking IO.

This commit adds isc_nm_work_enqueue() wrapper that would wraps around
the libuv API and runs it on top of associated worker loop.

The only limitation is that the function must be called from inside
network manager thread, so the call to the function should be wrapped
inside a (bound) task.
2021-05-31 14:52:05 +02:00
Ondřej Surý
211bfefbaa Use UV_VERSION_HEX to decide whether we need libuv shim functions
Instead of having a configure check for every missing function that has
been added in later version of libuv, we now use UV_VERSION_HEX to
decide whether we need the shim or not.
2021-05-31 14:52:05 +02:00
Ondřej Surý
7477d1b2ed Add uv_os_getenv() and uv_os_setenv() compatibility shims
The uv_os_getenv() and uv_os_setenv() functions were introduced in the
libuv >= 1.12.0.  Add simple compatibility shims for older versions.
2021-05-31 14:52:05 +02:00
Ondřej Surý
f752840db3 Add uv_req_get_data() and uv_req_set_data() compatibility shims
The uv_req_get_data() and uv_req_set_data() functions were introduced in
libuv >= 1.19.0, so we need to add compatibility shims with older libuv
versions.
2021-05-31 14:52:05 +02:00
Matthijs Mekking
f7f543d99b Reuse rdatset->ttl when dumping ancient RRsets
Rather than having an expensive 'expired' (fka 'stale_ttl') in the
rdataset structure, that is only used to be printed in a comment on
ancient RRsets, reuse the TTL field of the RRset.
2021-05-30 11:48:36 -07:00
Kevin Chen
0cdf85d204 Several serve-stale improvements
Commit a83c8cb0af updated masterdump so
that stale records in "rndc dumpdb" output no longer shows 0 TTLs.  In
this commit we change the name of the `rdataset->stale_ttl` field to
`rdataset->expired` to make its purpose clearer, and set it to zero in
cases where it's unused.

Add 'rbtdb->serve_stale_ttl' to various checks so that stale records
are not purged from the cache when they've been stale for RBTDB_VIRTUAL
(300) seconds.

Increment 'ns_statscounter_usedstale' when a stale answer is used.

Note: There was a question of whether 'overmem_purge' should be
purging ancient records, instead of stale ones.  It is left as purging
stale records, since stale records could take up the majority of the
cache.

This submission is copyrighted Akamai Technologies, Inc. and provided
under an MPL 2.0 license.

This commit was originally authored by Kevin Chen, and was updated by
Matthijs Mekking to match recent serve-stale developments.
2021-05-30 11:45:35 -07:00
Matthijs Mekking
c0dc5937c7 Reset DNS_FETCHOPT_TRYSTALE_ONTIMEOUT on resume
Once we resume a query, we should clear DNS_FETCHOPT_TRYSTALE_ONTIMEOUT
from the options to prevent triggering the stale-answer-client-timeout
on subsequent fetches.

If we don't this may cause a crash when for example when prefetch is
triggered after a query restart.
2021-05-30 00:03:51 -07:00
Evan Hunt
8bd8e995f1 clean up query correctly if already answered by serve-stale
when a serve-stale answer has been sent, the client continues waiting
for a proper answer. if a final completion event for the client does
arrive, it can just be cleaned up without sending a response, similar
to a canceled fetch.
2021-05-27 10:35:48 -07:00
Mark Andrews
d68b009cfe Remove priority from attribute constructor/destructor
On some platforms, the __attribute__ constructor and destructor won't
take priorities and the compilation failed.  On such platform would be
macOS.  For this reason, the constructor/destructor in the libisc was
reworked to not use priorities, but have a single constructor and
destructor that calls the appropriate routines in correct order.

This commit removes the extra priority because it's now not needed and
it also breaks a compilation on macOS with GCC 10.
2021-05-27 08:02:21 +02:00
Mark Andrews
715a2c7fc1 Add missing initialisations
configuring with --enable-mutex-atomics flagged these incorrectly
initialised variables on systems where pthread_mutex_init doesn't
just zero out the structure.
2021-05-26 08:15:08 +00:00
Ondřej Surý
2db5290579 Fix the sizeof() for array holding the pointers to clientmgr
The size of the array holding the pointers to clientmgr was created so
big it could hold the actual clientmgr objects, not just the pointer.
This commit fixes the size to be just the ncpus * sizeof(pointer).
2021-05-26 10:03:52 +02:00
Ondřej Surý
a227562f13 Cleanup the struct isc_nmiface
In previous MR, I forgot to remove the `struct isc_nmiface`, this commit
rectifies that.
2021-05-26 09:55:10 +02:00
Ondřej Surý
50270de8a0 Refactor the interface handling in the netmgr
The isc_nmiface_t type was holding just a single isc_sockaddr_t,
so we got rid of the datatype and use plain isc_sockaddr_t in place
where isc_nmiface_t was used before.  This means less type-casting and
shorter path to access isc_sockaddr_t members.

At the same time, instead of keeping the reference to the isc_sockaddr_t
that was passed to us when we start listening, we will keep a local
copy. This prevents the data race on destruction of the ns_interface_t
objects where pending nmsockets could reference the sockaddr of already
destroyed ns_interface_t object.
2021-05-26 09:43:12 +02:00
Mark Andrews
0a45af2e2f Consolidate xhdr fixups 2021-05-26 08:16:35 +10:00
Mark Andrews
00609f5094 Correct size calculation in dns_journal_iter_init()
* dns_journal_next() leaves the read point in the journal after the
transaction header so journal_seek() should be inside the loop.
* we need to recover from transaction header inconsistencies

Additionally when correcting for <size, serial0, serial1, 0> the
correct consistency check is isc_serial_gt() rather than
isc_serial_ge().  All instances updated.
2021-05-25 22:27:54 +10:00
Ondřej Surý
d0d37aa6d1 Don't set memory context name in resolver.c
We now attach to existing memory context instead of creating a new
memory context, so we should not set its name.
2021-05-25 07:25:44 +02:00
Ondřej Surý
a1c6fd5ede Adjust the fillcount and freemax for dns_message mempools
According to the measurements (recorded on GL!5085), the fillcount of 2
for namepool and fillcount of 4 for rdspool can fit 99.99% of request
for tested scenarios.

This was discovered by perf recording the single second recursive test
using flamethrower where the initial malloc lit up like a flare.
2021-05-24 20:44:58 +02:00
Ondřej Surý
28b65d8256 Reduce the number of clientmgr objects created
Previously, as a way of reducing the contention between threads a
clientmgr object would be created for each interface/IP address.

We tasks being more strictly bound to netmgr workers, this is no longer
needed and we can just create clientmgr object per worker queue (ncpus).

Each clientmgr object than would have a single task and single memory
context.
2021-05-24 20:44:54 +02:00
Ondřej Surý
aad7856b8e Don't create per bucket memory contexts in resolver
Similarly, the resolver code would create hundreds of memory contexts
just on the resolver setup.  The contention will be reduced directly in
the allocator, so for now just attach to the view memory instead of
creating separate memory context for each bucket.
2021-05-24 20:02:20 +02:00