The last remaining defines needed for platforms without NAME_MAX and
PATH_MAX (I'm looking at you, GNU Hurd) were moved to isc/dir.h where
it's prevalently used.
The ISC_STRERRORSIZE was defined in isc/platform.h header as the
value was different between Windows and POSIX platforms. Now that
Windows is gone, move the define to where it belongs.
Previously, each protocol (TCPDNS, TLSDNS) has specified own function to
disable pipelining on the connection. An oversight would lead to
assertion failure when opcode is not query over non-TCPDNS protocol
because the isc_nm_tcpdns_sequential() function would be called over
non-TCPDNS socket. This commit removes the per-protocol functions and
refactors the code to have and use common isc_nm_sequential() function
that would either disable the pipelining on the socket or would handle
the request in per specific manner. Currently it ignores the call for
HTTP sockets and causes assertion failure for protocols where it doesn't
make sense to call the function at all.
The requirements for BIND 9.17+ now requires C11 support from the
compiler, so we can safely drop most of the stdatomic.h shims from
lib/isc/unix/include/stdatomic.h.
This commit removes support for clang atomic builtins (clang >= 3.6.0
includes stdatomic.h header) and for Gcc __sync builtins.
The only compatibility shim that remains is support for __atomic
builtins for Gcc >= 4.7.0 since CentOS 7 still includes only Gcc 4.8.1
and the proper stdatomic.h header was only introduced in Gcc >= 4.9.
We cannot use DoH for zone transfers. According to RFC8484 a DoH
request contains exactly one DNS message (see Section 6: Definition of
the "application/dns-message" Media Type,
https://datatracker.ietf.org/doc/html/rfc8484#section-6). This makes
DoH unsuitable for zone transfers as often (and usually!) these need
more than one DNS message, especially for larger zones.
As zone transfers over DoH are not (yet) standardised, nor discussed
in RFC8484, the best thing we can do is to return "not implemented."
Technically DoH can be used to transfer small zones which fit in one
message, but that is not enough for the generic case.
Also, this commit makes the server-side DoH code ensure that no
multiple responses could be attempted to be sent over one HTTP/2
stream. In HTTP/2 one stream is mapped to one request/response
transaction. Now the write callback will be called with failure error
code in such a case.
The Windows support has been completely removed from the source tree
and BIND 9 now no longer supports native compilation on Windows.
We might consider reviewing mingw-w64 port if contributed by external
party, but no development efforts will be put into making BIND 9 compile
and run on Windows again.
The libuv has a support for running long running tasks in the dedicated
threadpools, so it doesn't affect networking IO.
This commit adds isc_nm_work_enqueue() wrapper that would wraps around
the libuv API and runs it on top of associated worker loop.
The only limitation is that the function must be called from inside
network manager thread, so the call to the function should be wrapped
inside a (bound) task.
On some platforms, the __attribute__ constructor and destructor won't
take priorities and the compilation failed. On such platform would be
macOS. For this reason, the constructor/destructor in the libisc was
reworked to not use priorities, but have a single constructor and
destructor that calls the appropriate routines in correct order.
This commit removes the extra priority because it's now not needed and
it also breaks a compilation on macOS with GCC 10.
The isc_nmiface_t type was holding just a single isc_sockaddr_t,
so we got rid of the datatype and use plain isc_sockaddr_t in place
where isc_nmiface_t was used before. This means less type-casting and
shorter path to access isc_sockaddr_t members.
At the same time, instead of keeping the reference to the isc_sockaddr_t
that was passed to us when we start listening, we will keep a local
copy. This prevents the data race on destruction of the ns_interface_t
objects where pending nmsockets could reference the sockaddr of already
destroyed ns_interface_t object.
Previously, as a way of reducing the contention between threads a
clientmgr object would be created for each interface/IP address.
We tasks being more strictly bound to netmgr workers, this is no longer
needed and we can just create clientmgr object per worker queue (ncpus).
Each clientmgr object than would have a single task and single memory
context.
Since a client object is bound to a netmgr handle, each client
will always be processed by the same netmgr worker, so we can
simplify the code by binding client->task to the same thread as
the client. Since ns__client_request() now runs in the same event
loop as client->task events, is no longer necessary to pause the
task manager before launching them.
Also removed some functions in isc_task that were not used.
Instead of using fixed quantum, this commit adds atomic counter for
number of items on each queue and uses the number of netievents
scheduled to run as the limit of maximum number of netievents for a
single process_queue() run.
This prevents the endless loops when the netievent would schedule more
netievents onto the same loop, but we don't have to pick "magic" number
for the quantum.
This commit adds a new configuration option to set the receive and send
buffer sizes on the TCP and UDP netmgr sockets. The default is `0`
which doesn't set any value and just uses the value set by the operating
system.
There's no magic value here - set it too small and the performance will
drop, set it too large, the buffers can fill-up with queries that have
already timeouted on the client side and nobody is interested for the
answer and this would just make the server clog up even more by making
it produce useless work.
The `netstat -su` can be used on POSIX systems to monitor the receive
and send buffer errors.
The netmgr listening, stoplistening, pausing and resuming functions
now use barriers for synchronization, which makes the code much simpler.
isc/barrier.h defines isc_barrier macros as a front-end for uv_barrier
on platforms where that works, and pthread_barrier where it doesn't
(including TSAN builds).
all zone loading tasks have the privileged flag, but we only want
them to run as privileged tasks when the server is being initialized;
if we privilege them the rest of the time, the server may hang for a
long time after a reload/reconfig. so now we call isc_taskmgr_setmode()
to turn privileged execution mode on or off in the task manager.
isc_task_privileged() returns true if the task's privilege flag is
set *and* the taskmgr is in privileged execution mode. this is used
to determine in which netmgr event queue the task should be run.
There was a theoretical possibility of clogging up the queue processing
with an endless loop where currently processing netievent would schedule
new netievent that would get processed immediately. This wasn't such a
problem when only netmgr netievents were processed, but with the
addition of the tasks, there are at least two situation where this could
happen:
1. In lib/dns/zone.c:setnsec3param() the task would get re-enqueued
when the zone was not yet fully loaded.
2. Tasks have internal quantum for maximum number of isc_events to be
processed, when the task quantum is reached, the task would get
rescheduled and then immediately processed by the netmgr queue
processing.
As the isc_queue doesn't have a mechanism to atomically move the queue,
this commit adds a mechanism to quantize the queue, so enqueueing new
netievents will never stop processing other uv_loop_t events.
The default quantum size is 128.
Since the queue used in the network manager allows items to be enqueued
more than once, tasks are now reference-counted around task_ready()
and task_run(). task_ready() now has a public API wrapper,
isc_task_ready(), that the netmgr can use to reschedule processing
of a task if the quantum has been reached.
Incidental changes: Cleaned up some unused fields left in isc_task_t
and isc_taskmgr_t after the last refactoring, and changed atomic
flags to atomic_bools for easier manipulation.
With taskmgr running on top of netmgr, the ordering of how the tasks and
netmgr shutdown interacts was wrong as previously isc_taskmgr_destroy()
was waiting until all tasks were properly shutdown and detached. This
responsibility was moved to netmgr, so we now need to do the following:
1. shutdown all the tasks - this schedules all shutdown events onto
the netmgr queue
2. shutdown the netmgr - this also makes sure all the tasks and
events are properly executed
3. Shutdown the taskmgr - this now waits for all the tasks to finish
running before returning
4. Shutdown the netmgr - this call waits for all the netmgr netievents
to finish before returning
This solves the race when the taskmgr object would be destroyed before
all the tasks were finished running in the netmgr loops.
Previously, netmgr, taskmgr, timermgr and socketmgr all had their own
isc_<*>mgr_create() and isc_<*>mgr_destroy() functions. The new
isc_managers_create() and isc_managers_destroy() fold all four into a
single function and makes sure the objects are created and destroy in
correct order.
Especially now, when taskmgr runs on top of netmgr, the correct order is
important and when the code was duplicated at many places it's easy to
make mistake.
The former isc_<*>mgr_create() and isc_<*>mgr_destroy() functions were
made private and a single call to isc_managers_create() and
isc_managers_destroy() is required at the program startup / shutdown.
This commit adds support for generating backtraces on Windows and
refactors the isc_backtrace API to match the Linux/BSD API (without
the isc_ prefix)
* isc_backtrace_gettrace() was renamed to isc_backtrace(), the third
argument was removed and the return type was changed to int
* isc_backtrace_symbols() was added
* isc_backtrace_symbols_fd() was added and used as appropriate
The malloc attribute allows compiler to do some optmizations on
functions that behave like malloc/calloc, like assuming that the
returned pointer do not alias other pointers.
This commit changes the taskmgr to run the individual tasks on the
netmgr internal workers. While an effort has been put into keeping the
taskmgr interface intact, couple of changes have been made:
* The taskmgr has no concept of universal privileged mode - rather the
tasks are either privileged or unprivileged (normal). The privileged
tasks are run as a first thing when the netmgr is unpaused. There
are now four different queues in in the netmgr:
1. priority queue - netievent on the priority queue are run even when
the taskmgr enter exclusive mode and netmgr is paused. This is
needed to properly start listening on the interfaces, free
resources and resume.
2. privileged task queue - only privileged tasks are queued here and
this is the first queue that gets processed when network manager
is unpaused using isc_nm_resume(). All netmgr workers need to
clean the privileged task queue before they all proceed normal
operation. Both task queues are processed when the workers are
finished.
3. task queue - only (traditional) task are scheduled here and this
queue along with privileged task queues are process when the
netmgr workers are finishing. This is needed to process the task
shutdown events.
4. normal queue - this is the queue with netmgr events, e.g. reading,
sending, callbacks and pretty much everything is processed here.
* The isc_taskmgr_create() now requires initialized netmgr (isc_nm_t)
object.
* The isc_nm_destroy() function now waits for indefinite time, but it
will print out the active objects when in tracing mode
(-DNETMGR_TRACE=1 and -DNETMGR_TRACE_VERBOSE=1), the netmgr has been
made a little bit more asynchronous and it might take longer time to
shutdown all the active networking connections.
* Previously, the isc_nm_stoplistening() was a synchronous operation.
This has been changed and the isc_nm_stoplistening() just schedules
the child sockets to stop listening and exits. This was needed to
prevent a deadlock as the the (traditional) tasks are now executed on
the netmgr threads.
* The socket selection logic in isc__nm_udp_send() was flawed, but
fortunatelly, it was broken, so we never hit the problem where we
created uvreq_t on a socket from nmhandle_t, but then a different
socket could be picked up and then we were trying to run the send
callback on a socket that had different threadid than currently
running.
Since all the libraries are internal now, just cleanup the ISCAPI remnants
in isc_socket, isc_task and isc_timer APIs. This means, there's one less
layer as following changes have been done:
* struct isc_socket and struct isc_socketmgr have been removed
* struct isc__socket and struct isc__socketmgr have been renamed
to struct isc_socket and struct isc_socketmgr
* struct isc_task and struct isc_taskmgr have been removed
* struct isc__task and struct isc__taskmgr have been renamed
to struct isc_task and struct isc_taskmgr
* struct isc_timer and struct isc_timermgr have been removed
* struct isc__timer and struct isc__timermgr have been renamed
to struct isc_timer and struct isc_timermgr
* All the associated code that dealt with typing isc_<foo>
to isc__<foo> and back has been removed.
Previously, the taskmgr, timermgr and socketmgr had a constructor
variant, that would create the mgr on top of existing appctx. This was
no longer true and isc_<*>mgr was just calling isc_<*>mgr_create()
directly without any extra code.
This commit just cleans up the extra function.
The isc_nm_*connect() functions were refactored to always return the
connection status via the connect callback instead of sometimes returning
the hard failure directly (for example, when the socket could not be
created, or when the network manager was shutting down).
This commit changes the connect functions in all the network manager
modules, and also makes the necessary refactoring changes in places
where the connect functions are called.
util.h requires ISC_CONSTRUCTOR definition, which depends on config.h
inclusion. It does not include it from isc/util.h (or any other header).
Using isc/util.h fails hard when isc/util.h is used without including
bind's config.h.
Move the check to c file, where ISC_CONSTRUCTOR is used. Ensure config.h
is included there.
The current isc_time_now uses CLOCK_REALTIME_COARSE which only updates
on a timer tick. This clock is generally fine for millisecond accuracy,
but on servers with 100hz clocks, this clock is nowhere near accurate
enough for microsecond accuracy.
This commit adds a new isc_time_now_hires function that uses
CLOCK_REALTIME, which gives the current time, though it is somewhat
expensive to call. When microsecond accuracy is required, it may be
required to use extra resources for higher accuracy.
The RFC7828 specifies the keepalive interval to be 16-bit, specified in
units of 100 milliseconds and the configuration options tcp-*-timeouts
are following the suit. The units of 100 milliseconds are very
unintuitive and while we can't change the configuration and presentation
format, we should not follow this weird unit in the API.
This commit changes the isc_nm_(get|set)timeouts() functions to work
with milliseconds and convert the values to milliseconds before passing
them to the function, not just internally.
After the TCPDNS refactoring the initial and idle timers were broken and
only the tcp-initial-timeout was always applied on the whole TCP
connection.
This broke any TCP connection that took longer than tcp-initial-timeout,
most often this would affect large zone AXFRs.
This commit changes the timeout logic in this way:
* On TCP connection accept the tcp-initial-timeout is applied
and the timer is started
* When we are processing and/or sending any DNS message the timer is
stopped
* When we stop processing all DNS messages, the tcp-idle-timeout
is applied and the timer is started again
- style, cleanup, and removal of unnecessary code.
- combined isc_nm_http_add_endpoint() and isc_nm_http_add_doh_endpoint()
into one function, renamed isc_http_endpoint().
- moved isc_nm_http_connect_send_request() into doh_test.c as a helper
function; remove it from the public API.
- renamed isc_http2 and isc_nm_http2 types and functions to just isc_http
and isc_nm_http, for consistency with other existing names.
- shortened a number of long names.
- the caller is now responsible for determining the peer address.
in isc_nm_httpconnect(); this eliminates the need to parse the URI
and the dependency on an external resolver.
- the caller is also now responsible for creating the SSL client context,
for consistency with isc_nm_tlsdnsconnect().
- added setter functions for HTTP/2 ALPN. instead of setting up ALPN in
isc_tlsctx_createclient(), we now have a function
isc_tlsctx_enable_http2client_alpn() that can be run from
isc_nm_httpconnect().
- refactored isc_nm_httprequest() into separate read and send functions.
isc_nm_send() or isc_nm_read() is called on an http socket, it will
be stored until a corresponding isc_nm_read() or _send() arrives; when
we have both halves of the pair the HTTP request will be initiated.
- isc_nm_httprequest() is renamed isc__nm_http_request() for use as an
internal helper function by the DoH unit test. (eventually doh_test
should be rewritten to use read and send, and this function should
be removed.)
- added implementations of isc__nm_tls_settimeout() and
isc__nm_http_settimeout().
- increased NGHTTP2 header block length for client connections to 128K.
- use isc_mem_t for internal memory allocations inside nghttp2, to
help track memory leaks.
- send "Cache-Control" header in requests and responses. (note:
currently we try to bypass HTTP caching proxies, but ideally we should
interact with them: https://tools.ietf.org/html/rfc8484#section-5.1)
Instead of calling isc_tls_initialize()/isc_tls_destroy() explicitly use
gcc/clang attributes on POSIX and DLLMain on Windows to initialize and
shutdown OpenSSL library.
This resolves the issue when isc_nm_create() / isc_nm_destroy() was
called multiple times and it would call OpenSSL library destructors from
isc_nm_destroy().
At the same time, since we now have introduced the ctor/dtor for libisc,
this commit moves the isc_mem API initialization (the list of the
contexts) and changes the isc_mem_checkdestroyed() to schedule the
checking of memory context on library unload instead of executing the
code immediately.
The two memory debugging features: ISC_MEM_DEFAULTFILL
(ISC_MEMFLAG_FILL) and ISC_MEM_TRACKLINES were always enabled in all
builds and the former was only disabled in `named`.
This commits disables those two features in non-developer build to make
the memory allocator significantly faster.
This is yet another step into unlocking some parts of the memory
contexts. All the regularly updated variables has been turned into
atomic types, so we can later remove the locks when updating various
counters.
Also unlock as much code as possible without breaking anything.
On 24-core machine, the tests would crash because we would run out of
the hazard pointers. We now adjust the number of hazard pointers to be
in the <128,256> interval based on the number of available cores.
Note: This is just a band-aid and needs a proper fix.
Previously, the applications using libisc would be able to override the
internal memory methods with own implementation. This was no longer
possible, but the extra level of indirection was not removed. This
commit removes the extra level of indirection for the memory methods and
the default_memalloc() and default_memfree().
The internal memory allocator had an extra code to keep a list of blocks
for small size allocation. This would help to reduce the interactions
with the system malloc as the memory would be already allocated from the
system, but there's an extra cost associated with that - all the
allocations/deallocations must be locked, effectively eliminating any
optimizations in the system allocator targeted at multi-threaded
applications. While the isc_mem API is still using locks pretty heavily,
this is a first step into reducing the memory allocation/deallocation
contention.
The <isc/readline.h> header provided a compatibility shim to use when
other non-GNU readline libraries are in use. The two places where
readline library is being used is nslookup and nsupdate, so the header
file has been moved to bin/dig directory and it's directly included from
bin/nsupdate.
This also conceals any readline headers exposed from the libisc headers.
This commit completes the support for DNS-over-HTTP(S) built on top of
nghttp2 and plugs it into the BIND. Support for both GET and POST
requests is present, as required by RFC8484.
Both encrypted (via TLS) and unencrypted HTTP/2 connections are
supported. The latter are mostly there for debugging/troubleshooting
purposes and for the means of encryption offloading to third-party
software (as might be desirable in some environments to simplify TLS
certificates management).
This commit includes work-in-progress implementation of
DNS-over-HTTP(S).
Server-side code remains mostly untested, and there is only support
for POST requests.
This commit resurrects the old TLS code from
8f73c70d23.
It also includes numerous stability fixes and support for
isc_nm_cancelread() for the TLS layer.
The code was resurrected to be used for DoH.
* Following the example set in 634bdfb16d, the tlsdns netmgr
module now uses libuv and SSL primitives directly, rather than
opening a TLS socket which opens a TCP socket, as the previous
model was difficult to debug. Closes#2335.
* Remove the netmgr tls layer (we will have to re-add it for DoH)
* Add isc_tls API to wrap the OpenSSL SSL_CTX object into libisc
library; move the OpenSSL initialization/deinitialization from dstapi
needed for OpenSSL 1.0.x to the isc_tls_{initialize,destroy}()
* Add couple of new shims needed for OpenSSL 1.0.x
* When LibreSSL is used, require at least version 2.7.0 that
has the best OpenSSL 1.1.x compatibility and auto init/deinit
* Enforce OpenSSL 1.1.x usage on Windows
* Added a TLSDNS unit test and implemented a simple TLSDNS echo
server and client.
This is a part of the works that intends to make the netmgr stable,
testable, maintainable and tested. It contains a numerous changes to
the netmgr code and unfortunately, it was not possible to split this
into smaller chunks as the work here needs to be committed as a complete
works.
NOTE: There's a quite a lot of duplicated code between udp.c, tcp.c and
tcpdns.c and it should be a subject to refactoring in the future.
The changes that are included in this commit are listed here
(extensively, but not exclusively):
* The netmgr_test unit test was split into individual tests (udp_test,
tcp_test, tcpdns_test and newly added tcp_quota_test)
* The udp_test and tcp_test has been extended to allow programatic
failures from the libuv API. Unfortunately, we can't use cmocka
mock() and will_return(), so we emulate the behaviour with #define and
including the netmgr/{udp,tcp}.c source file directly.
* The netievents that we put on the nm queue have variable number of
members, out of these the isc_nmsocket_t and isc_nmhandle_t always
needs to be attached before enqueueing the netievent_<foo> and
detached after we have called the isc_nm_async_<foo> to ensure that
the socket (handle) doesn't disappear between scheduling the event and
actually executing the event.
* Cancelling the in-flight TCP connection using libuv requires to call
uv_close() on the original uv_tcp_t handle which just breaks too many
assumptions we have in the netmgr code. Instead of using uv_timer for
TCP connection timeouts, we use platform specific socket option.
* Fix the synchronization between {nm,async}_{listentcp,tcpconnect}
When isc_nm_listentcp() or isc_nm_tcpconnect() is called it was
waiting for socket to either end up with error (that path was fine) or
to be listening or connected using condition variable and mutex.
Several things could happen:
0. everything is ok
1. the waiting thread would miss the SIGNAL() - because the enqueued
event would be processed faster than we could start WAIT()ing.
In case the operation would end up with error, it would be ok, as
the error variable would be unchanged.
2. the waiting thread miss the sock->{connected,listening} = `true`
would be set to `false` in the tcp_{listen,connect}close_cb() as
the connection would be so short lived that the socket would be
closed before we could even start WAIT()ing
* The tcpdns has been converted to using libuv directly. Previously,
the tcpdns protocol used tcp protocol from netmgr, this proved to be
very complicated to understand, fix and make changes to. The new
tcpdns protocol is modeled in a similar way how tcp netmgr protocol.
Closes: #2194, #2283, #2318, #2266, #2034, #1920
* The tcp and tcpdns is now not using isc_uv_import/isc_uv_export to
pass accepted TCP sockets between netthreads, but instead (similar to
UDP) uses per netthread uv_loop listener. This greatly reduces the
complexity as the socket is always run in the associated nm and uv
loops, and we are also not touching the libuv internals.
There's an unfortunate side effect though, the new code requires
support for load-balanced sockets from the operating system for both
UDP and TCP (see #2137). If the operating system doesn't support the
load balanced sockets (either SO_REUSEPORT on Linux or SO_REUSEPORT_LB
on FreeBSD 12+), the number of netthreads is limited to 1.
* The netmgr has now two debugging #ifdefs:
1. Already existing NETMGR_TRACE prints any dangling nmsockets and
nmhandles before triggering assertion failure. This options would
reduce performance when enabled, but in theory, it could be enabled
on low-performance systems.
2. New NETMGR_TRACE_VERBOSE option has been added that enables
extensive netmgr logging that allows the software engineer to
precisely track any attach/detach operations on the nmsockets and
nmhandles. This is not suitable for any kind of production
machine, only for debugging.
* The tlsdns netmgr protocol has been split from the tcpdns and it still
uses the old method of stacking the netmgr boxes on top of each other.
We will have to refactor the tlsdns netmgr protocol to use the same
approach - build the stack using only libuv and openssl.
* Limit but not assert the tcp buffer size in tcp_alloc_cb
Closes: #2061
Add server-side TLS support to netmgr - that includes moving some of the
isc_nm_ functions from tcp.c to a wrapper in netmgr.c calling a proper
tcp or tls function, and a new isc_nm_listentls() function.
Add DoT support to tcpdns - isc_nm_listentlsdns().
this function sets the read timeout for the socket associated
with a netmgr handle and, if the timer is running, resets it.
for TCPDNS sockets it also sets the read timeout and resets the
timer on the outer TCP socket.
- isc_nm_tcpdnsconnect() sets up up an outgoing TCP DNS connection.
- isc_nm_tcpconnect(), _udpconnect() and _tcpdnsconnect() now take a
timeout argument to ensure connections time out and are correctly
cleaned up on failure.
- isc_nm_read() now supports UDP; it reads a single datagram and then
stops until the next time it's called.
- isc_nm_cancelread() now runs asynchronously to prevent assertion
failure if reading is interrupted by a non-network thread (e.g.
a timeout).
- isc_nm_cancelread() can now apply to UDP sockets.
- added shim code to support UDP connection in versions of libuv
prior to 1.27, when uv_udp_connect() was added
all these functions will be used to support outgoing queries in dig,
xfrin, dispatch, etc.
1. The isc__nm_tcp_send() and isc__nm_tcp_read() was not checking
whether the socket was still alive and scheduling reads/sends on
closed socket.
2. The isc_nm_read(), isc_nm_send() and isc_nm_resumeread() have been
changed to always return the error conditions via the callbacks, so
they always succeed. This applies to all protocols (UDP, TCP and
TCPDNS).
Attaching and detaching handle pointers will make it easier to
determine where and why reference counting errors have occurred.
A handle needs to be referenced more than once when multiple
asynchronous operations are in flight, so callers must now maintain
multiple handle pointers for each pending operation. For example,
ns_client objects now contain:
- reqhandle: held while waiting for a request callback (query,
notify, update)
- sendhandle: held while waiting for a send callback
- fetchhandle: held while waiting for a recursive fetch to
complete
- updatehandle: held while waiting for an update-forwarding
task to complete
control channel connection objects now contain:
- readhandle: held while waiting for a read callback
- sendhandle: held while waiting for a send callback
- cmdhandle: held while an rndc command is running
httpd connections contain:
- readhandle: held while waiting for a read callback
- sendhandle: held while waiting for a send callback
- rename isc_nmsocket_t->tcphandle to statichandle
- cancelread functions now take handles instead of sockets
- add a 'client' flag in socket objects, currently unused, to
indicate whether it is to be used as a client or server socket
While
if (isc_refcount_decrement() == 1) { // memory_order_release
isc_refcount_destroy(); // memory_order_acquire
...
}
is theoretically the most efficent in practice, using
memory_order_acq_rel produces the same code on x86_64 and doesn't
trigger tsan data races (which use a idealistic model) if
isc_refcount_destroy() is not called immediately. In fact
isc_refcount_destroy() could be removed if we didn't want
to check for the count being 0 when isc_refcount_destroy() is
called.
https://stackoverflow.com/questions/49112732/memory-order-in-shared-pointer-destructor
This commit updates and simplifies the checks for the readline support
in nslookup and nsupdate:
* Change the autoconf checks to pkg-config only, all supported
libraries have accompanying .pc files now.
* Add editline support in addition to libedit and GNU readline
* Add isc/readline.h shim header that defines dummy readline()
function when no readline library is available
When pk11_numbits() is passed a user provided input that contains all
zeroes (via crafted DNS message), it would crash with assertion
failure. Fix that by properly handling such input.
Created isc_refcount_decrement_expect macro to test conditionally
the return value to ensure it is in expected range. Converted
unchecked isc_refcount_decrement to use isc_refcount_decrement_expect.
Converted INSIST(isc_refcount_decrement()...) to isc_refcount_decrement_expect.
The stdatomic shims for non-C11 compilers (Windows, old gcc, ...) and
mutexatomic implemented only and minimal subset of the atomic types.
This commit adds 16-bit operations for Windows and all atomic types as
defined in standard.
the blackhole ACL was accidentally disabled with respect to client
queries during the netmgr conversion.
in order to make this work for TCP, it was necessary to add a return
code to the accept callback functions passed to isc_nm_listentcp() and
isc_nm_listentcpdns().
The isc_nm_cancelread() function cancels reading on a connected
socket and calls its read callback function with a 'result'
parameter of ISC_R_CANCELED.
the isc_nm_tcpconnect() function establishes a client connection via
TCP. once the connection is esablished, a callback function will be
called with a newly created network manager handle.
there is no need for a caller to reference-count socket objects.
they need tto be able tto close listener sockets (i.e., those
returned by isc_nm_listen{udp,tcp,tcpdns}), and an isc_nmsocket_close()
function has been added for that. other sockets are only accessed via
handles.
when built with "configure --enable-singletrace", named will produce
detailed query logging at the highest debug level for any query with
query ID zero.
this enables monitoring of the progress of a single query by specifying
the QID using "dig +qid=0". the "client" logging category should be set
to a low severity level to suppress logging of other queries. (the
chance of another query using QID=0 at the same time is only 1 in 2^16.)
"--enable-singletrace" turns on "--enable-querytrace" as well, so if the
logging severity is not lowered, all other queries will be logged
verbosely as well. compiling with either of these options will impair
query performance; they should only be turned on when testing or
troubleshooting.
Instead of using bind() and passing the listening socket to the children
threads using uv_export/uv_import use one thread that does the accepting,
and then passes the connected socket using uv_export/uv_import to a random
worker. The previous solution had thundering herd problems (all workers
waking up on one connection and trying to accept()), this one avoids this
and is simpler.
The tcp clients quota is simplified with isc_quota_attach_cb - a callback
is issued when the quota is available.
The rewrite of BIND 9 build system is a large work and cannot be reasonable
split into separate merge requests. Addition of the automake has a positive
effect on the readability and maintainability of the build system as it is more
declarative, it allows conditional and we are able to drop all of the custom
make code that BIND 9 developed over the years to overcome the deficiencies of
autoconf + custom Makefile.in files.
This squashed commit contains following changes:
- conversion (or rather fresh rewrite) of all Makefile.in files to Makefile.am
by using automake
- the libtool is now properly integrated with automake (the way we used it
was rather hackish as the only official way how to use libtool is via
automake
- the dynamic module loading was rewritten from a custom patchwork to libtool's
libltdl (which includes the patchwork to support module loading on different
systems internally)
- conversion of the unit test executor from kyua to automake parallel driver
- conversion of the system test executor from custom make/shell to automake
parallel driver
- The GSSAPI has been refactored, the custom SPNEGO on the basis that
all major KRB5/GSSAPI (mit-krb5, heimdal and Windows) implementations
support SPNEGO mechanism.
- The various defunct tests from bin/tests have been removed:
bin/tests/optional and bin/tests/pkcs11
- The text files generated from the MD files have been removed, the
MarkDown has been designed to be readable by both humans and computers
- The xsl header is now generated by a simple sed command instead of
perl helper
- The <irs/platform.h> header has been removed
- cleanups of configure.ac script to make it more simpler, addition of multiple
macros (there's still work to be done though)
- the tarball can now be prepared with `make dist`
- the system tests are partially able to run in oot build
Here's a list of unfinished work that needs to be completed in subsequent merge
requests:
- `make distcheck` doesn't yet work (because of system tests oot run is not yet
finished)
- documentation is not yet built, there's a different merge request with docbook
to sphinx-build rst conversion that needs to be rebased and adapted on top of
the automake
- msvc build is non functional yet and we need to decide whether we will just
cross-compile bind9 using mingw-w64 or fix the msvc build
- contributed dlz modules are not included neither in the autoconf nor automake
The pk11/constants.h header contained static CK_BYTE arrays and
we had to use #defines to pull only those we need. This commit
changes the constants to only define byte arrays with the content
and either use them directly or define the CK_BYTE arrays locally
where used.
We introduce a isc_quota_attach_cb function - if ISC_R_QUOTA is returned
at the time the function is called, then a callback will be called when
there's quota available (with quota already attached). The callbacks are
organized as a LIFO queue in the quota structure.
It's needed for TCP client quota - with old networking code we had one
single place where tcp clients quota was processed so we could resume
accepting when the we had spare slots, but it's gone with netmgr - now
we need to notify the listener/accepter that there's quota available so
that it can resume accepting.
Remove unused isc_quota_force() function.
The isc_quote_reserve and isc_quota_release were used only internally
from the quota.c and the tests. We should not expose API we are not
using.
tcpdns used transport-specific functions to operate on the outer socket.
Use generic ones instead, and select the proper call in netmgr.c.
Make the missing functions (e.g. isc_nm_read) generic and add type-specific
calls (isc__nm_tcp_read). This is the preparation for netmgr TLS layer.
The isc_mem API now crashes on memory allocation failure, and this is
the next commit in series to cleanup the code that could fail before,
but cannot fail now, e.g. isc_result_t return type has been changed to
void for the isc_log API functions that could only return ISC_R_SUCCESS.
The <isc/md.h> header directly included <openssl/evp.h> header which
enforced all users of the libisc library to explicitly list the include
path to OpenSSL and link with -lcrypto. By hiding the specific
implementation into the private namespace, we no longer enforce this.
In the long run, this might also allow us to switch cryptographic
library implementation without affecting the downstream users.
While making the isc_md_type_t type opaque, the API using the data type
was changed to use the pointer to isc_md_type_t instead of using the
type directly.
The <isc/md.h> header directly included <openssl/hmac.h> header which
enforced all users of the libisc library to explicitly list the include
path to OpenSSL and link with -lcrypto. By hiding the specific
implementation into the private namespace, we no longer enforce this.
In the long run, this might also allow us to switch cryptographic
library implementation without affecting the downstream users.
The two "functions" that isc/safe.h declared before were actually simple
defines to matching OpenSSL functions. The downside of the approach was
enforcing all users of the libisc library to explicitly list the include
path to OpenSSL and link with -lcrypto. By hiding the specific
implementation into the private namespace changing the defines into
simple functions, we no longer enforce this. In the long run, this
might also allow us to switch cryptographic library implementation
without affecting the downstream users.
The previous commit removed the code related to the internal symbol
table. On platforms where available, we can now use backtrace_symbols()
to print more verbose symbols table to the output.
As there's now general availability of backtrace() and
backtrace_symbols() functions (see below), the commit also removes the
usage of glibc internals and the custom stack tracing.
* backtrace(), backtrace_symbols(), and backtrace_symbols_fd() are
provided in glibc since version 2.1.
* backtrace(), backtrace_symbols(), and backtrace_symbols_fd() first
appeared in Mac OS X 10.5.
* The backtrace() library of functions first appeared in NetBSD 7.0 and
FreeBSD 10.0.
This commit simplifies a bit the lock management within dns_resolver_prime()
and prime_done() functions by means of turning resolver's attribute
"priming" into an atomic_bool and by creating only one dependent object on the
lock "primelock", namely the "primefetch" attribute.
By having the attribute "priming" as an atomic type, it save us from having to
use a lock just to test if priming is on or off for the given resolver context
object, within "dns_resolver_prime" function.
The "primelock" lock is still necessary, since dns_resolver_prime() function
internally calls dns_resolver_createfetch(), and whenever this function
succeeds it registers an event in the task manager which could be called by
another thread, namely the "prime_done" function, and this function is
responsible for disposing the "primefetch" attribute in the resolver object,
also for resetting "priming" attribute to false.
It is important that the invariant "priming == false AND primefetch == NULL"
remains constant, so that any thread calling "dns_resolver_prime" knows for sure
that if the "priming" attribute is false, "primefetch" attribute should also be
NULL, so a new fetch context could be created to fulfill this purpose, and
assigned to "primefetch" attribute under the lock protection.
To honor the explanation above, dns_resolver_prime is implemented as follow:
1. Atomically checks the attribute "priming" for the given resolver context.
2. If "priming" is false, assumes that "primefetch" is NULL (this is
ensured by the "prime_done" implementation), acquire "primelock"
lock and create a new fetch context, update "primefetch" pointer to
point to the newly allocated fetch context.
3. If "priming" is true, assumes that the job is already in progress,
no locks are acquired, nothing else to do.
To keep the previous invariant consistent, "prime_done" is implemented as follow:
1. Acquire "primefetch" lock.
2. Keep a reference to the current "primefetch" object;
3. Reset "primefetch" attribute to NULL.
4. Release "primefetch" lock.
5. Atomically update "priming" attribute to false.
6. Destroy the "primefetch" object by using the temporary reference.
This ensures that if "priming" is false, "primefetch" was already reset to NULL.
It doesn't make any difference in having the "priming" attribute not protected
by a lock, since the visible state of this variable would depend on the calling
order of the functions "dns_resolver_prime" and "prime_done".
As an example, suppose that instead of using an atomic for the "priming" attribute
we employed a lock to protect it.
Now suppose that "prime_done" function is called by Thread A, it is then preempted
before acquiring the lock, thus not reseting "priming" to false.
In parallel to that suppose that a Thread B is scheduled and that it calls
"dns_resolver_prime()", it then acquires the lock and check that "priming" is true,
thus it will consider that this resolver object is already priming and it won't do
any more job.
Conversely if the lock order was acquired in the other direction, Thread B would check
that "priming" is false (since prime_done acquired the lock first and set "priming" to false)
and it would initiate a priming fetch for this resolver.
An atomic variable wouldn't change this behavior, since it would behave exactly the
same, depending on the function call order, with the exception that it would avoid
having to use a lock.
There should be no side effects resulting from this change, since the previous
implementation employed use of the more general resolver's "lock" mutex, which
is used in far more contexts, but in the specifics of the "dns_resolver_prime"
and "prime_done" it was only used to protect "primefetch" and "priming" attributes,
which are not used in any of the other critical sections protected by the same lock,
thus having zero dependency on those variables.
hp implementation requires an object for each thread accessing
a hazard pointer. previous implementation had a hardcoded
HP_MAX_THREAD value of 128, which failed on machines with lots of
CPU cores (named uses 3n threads). We make isc__hp_max_threads
configurable at startup, with the value set to 4*named_g_cpus.
It's also important for this value not to be too big as we do
linear searches on a list.
The isc_refcount API that provides reference counting lost DbC checks for
overflows and underflows in the isc_refcount_{increment,decrement} functions.
The commit restores the overflow check in the isc_refcount_increment and
underflows check in the isc_refcount_decrement by checking for the previous
value to not be on the boundary.
- the socket stat counters have been moved from socket.h to stats.h.
- isc_nm_t now attaches to the same stats counter group as
isc_socketmgr_t, so that both managers can increment the same
set of statistics
- isc__nmsocket_init() now takes an interface as a paramter so that
the address family can be determined when initializing the socket.
- based on the address family and socket type, a group of statistics
counters will be associated with the socket - for example, UDP4Active
with IPv4 UDP sockets and TCP6Active with IPv6 TCP sockets. note
that no counters are currently associated with TCPDNS sockets; those
stats will be handled by the underlying TCP socket.
- the counters are not actually used by netmgr sockets yet; counter
increment and decrement calls will be added in a later commit.
After the network manager rewrite, tcp-higwater stats was only being
updated when a valid DNS query was received over tcp.
It turns out tcp-quota is updated right after a tcp connection is
accepted, before any data is read, so in the event that some client
connect but don't send a valid query, it wouldn't be taken into
account to update tcp-highwater stats, that is wrong.
This commit fix tcp-highwater to update its stats whenever a tcp connection
is established, independent of what happens after (timeout/invalid
request, etc).
The new ISC_THREAD_LOCAL macro unifies usage of platform dependent
Thread Local Storage definition thread_local vs __thread vs
__declspec(thread) to a single macro.
The commit also unifies the required level of support for TLS as for
some parts of the code it was mandatory and for some parts of the code
it wasn't.
FCTX_ATTR_SHUTTINGDOWN needs to be set and tested while holding the node
lock but the rest of the attributes don't as they are task locked. Making
fctx->attributes atomic allows both behaviours without races.
- restore support for tcp-initial-timeout, tcp-idle-timeout,
tcp-keepalive-timeout and tcp-advertised-timeout configuration
options, which were ineffective previously.
when the TCPDNS_CLIENTS_PER_CONN limit has been exceeded for a TCP
DNS connection, switch to sequential mode to ensure that memory cannot
be exhausted by too many simultaneous queries.
This allows a task to be temporary disabled so that objects won't be
processed simultaneously by libuv events and isc_task events. When a
task is paused, currently running events may complete, but no further
event will added to the run queue will be executed until the task is
unpaused.
When a task manager is created, we can now specify an `isc_nm`
object to associate with it; thereafter when the task manager is
placed into exclusive mode, the network manager will be paused.
This is a replacement for the existing isc_socket and isc_socketmgr
implementation. It uses libuv for asynchronous network communication;
"networker" objects will be distributed across worker threads reading
incoming packets and sending them for processing.
UDP listener sockets automatically create an array of "child" sockets
so each worker can listen separately.
TCP sockets are shared amongst worker threads.
A TCPDNS socket is a wrapper around a TCP socket, which handles the
the two-byte length field at the beginning of DNS messages over TCP.
(Other wrapper socket types can be implemented in the future to handle
DNS over TLS, DNS over HTTPS, etc.)
The double-locked queue implementation is still currently in use
in ns_client, but will be replaced by a fetch-and-add array queue.
This commit moves it from queue.h to list.h so that queue.h can be
used for the new data structure, and clean up dependencies between
list.h and types.h. Later, when the ISC_QUEUE is no longer is use,
it will be removed completely.
cppcheck 1.89 enabled certain value flow analysis mechanisms [1] which
trigger null pointer dereference false positives in lib/dns/rpz.c:
lib/dns/rpz.c:582:7: warning: Possible null pointer dereference: tgt_ip [nullPointer]
if (KEY_IS_IPV4(tgt_prefix, tgt_ip)) {
^
lib/dns/rpz.c:1419:44: note: Calling function 'adj_trigger_cnt', 4th argument 'NULL' value is 0
adj_trigger_cnt(rpzs, rpz_num, rpz_type, NULL, 0, true);
^
lib/dns/rpz.c:582:7: note: Null pointer dereference
if (KEY_IS_IPV4(tgt_prefix, tgt_ip)) {
^
lib/dns/rpz.c:596:7: warning: Possible null pointer dereference: tgt_ip [nullPointer]
if (KEY_IS_IPV4(tgt_prefix, tgt_ip)) {
^
lib/dns/rpz.c:1419:44: note: Calling function 'adj_trigger_cnt', 4th argument 'NULL' value is 0
adj_trigger_cnt(rpzs, rpz_num, rpz_type, NULL, 0, true);
^
lib/dns/rpz.c:596:7: note: Null pointer dereference
if (KEY_IS_IPV4(tgt_prefix, tgt_ip)) {
^
lib/dns/rpz.c:610:7: warning: Possible null pointer dereference: tgt_ip [nullPointer]
if (KEY_IS_IPV4(tgt_prefix, tgt_ip)) {
^
lib/dns/rpz.c:1419:44: note: Calling function 'adj_trigger_cnt', 4th argument 'NULL' value is 0
adj_trigger_cnt(rpzs, rpz_num, rpz_type, NULL, 0, true);
^
lib/dns/rpz.c:610:7: note: Null pointer dereference
if (KEY_IS_IPV4(tgt_prefix, tgt_ip)) {
^
It seems that cppcheck no longer treats at least some REQUIRE()
assertion failures as fatal, so add extra assertion macro definitions to
lib/isc/include/isc/util.h that are only used when the CPPCHECK
preprocessor macro is defined; these definitions make cppcheck 1.89
behave as expected.
There is an important requirement for these custom definitions to work:
cppcheck must properly treat abort() as a function which does not
return. In order for that to happen, the __GNUC__ macro must be set to
a high enough number (because system include directories are used and
system headers compile attributes away if __GNUC__ is not high enough).
__GNUC__ is thus set to the major version number of the GCC compiler
used, which is what that latter does itself during compilation.
[1] aaeec462e6