bind9

mirror of https://github.com/isc-projects/bind9.git synced 2026-04-29 18:09:11 -04:00

Author	SHA1	Message	Date
Artem Boldariev	4606384345	Extend isc__nm_socket_tcp_nodelay() to accept value This makes it possible to both enable and disable Nagle's algorithm for a TCP socket descriptor, before the change it was possible only to disable it.	2022-12-20 22:13:53 +02:00
Artem Boldariev	f395cd4b3e	Add isc_nm_streamdnssocket (aka Stream DNS) This commit adds an initial implementation of isc_nm_streamdnssocket transport: a unified transport for DNS over stream protocols messages, which is capable of replacing both TCP DNS and TLS DNS transports. Currently, the interface it provides is a unified set of interfaces provided by both of the transports it attempts to replace. The transport is built around "isc_dnsbuffer_t" and "isc_dnsstream_assembler_t" objects and attempts to minimise both the number of memory allocations during network transfers as well as memory usage.	2022-12-20 22:13:51 +02:00
Artem Boldariev	c0c59b55ab	TLS: add an internal function isc__nmhandle_get_selected_alpn() The added function provides the interface for getting an ALPN tag negotiated during TLS connection establishment. The new function can be used by higher level transports.	2022-12-20 21:24:44 +02:00
Artem Boldariev	15e626f1ca	TLS: add manual read timer control mode This commit adds manual read timer control mode, similarly to TCP. This way the read timer can be controlled manually using: * isc__nmsocket_timer_start(); * isc__nmsocket_timer_stop(); * isc__nmsocket_timer_restart(). The change is required to make it possible to implement more sophisticated read timer control policies in DNS transports, built on top of TLS.	2022-12-20 21:24:44 +02:00
Artem Boldariev	9aabd55725	TCP: add manual read timer control mode This commit adds a manual read timer control mode to the TCP code (adding isc__nmhandle_set_manual_timer() as the interface to it). Manual read timer control mode suppresses read timer restarting the read timer when receiving any amount of data. This way the read timer can be controlled manually using: * isc__nmsocket_timer_start(); * isc__nmsocket_timer_stop(); * isc__nmsocket_timer_restart(). The change is required to make it possible to implement more sophisticated read timer control policies in DNS transports, built on top of TCP.	2022-12-20 21:24:44 +02:00
Artem Boldariev	f4760358f8	TLS: expose the ability to (re)start and stop underlying read timer This commit adds implementation of isc__nmsocket_timer_restart() and isc__nmsocket_timer_stop() for generic TLS code in order to make its interface more compatible with that of TCP.	2022-12-20 21:24:44 +02:00
Artem Boldariev	f18a9b3743	TLS: add isc__nmsocket_timer_running() support This commit adds isc__nmsocket_timer_running() support to the generic TLS code in order to make it more compatible with TCP.	2022-12-20 21:24:44 +02:00
Artem Boldariev	c0808532e1	TLS: isc_nm_bad_request() and isc__nmsocket_reset() support This commit adds implementations of isc_nm_bad_request() and isc__nmsocket_reset() to the generic TLS stream code in order to make it more compatible with TCP code.	2022-12-20 21:24:44 +02:00
Ondřej Surý	6bd2b34180	Enable auto-reallocation for all isc_buffer_allocate() buffers When isc_buffer_t buffer is created with isc_buffer_allocate() assume that we want it to always auto-reallocate instead of having an extra call to enable auto-reallocation.	2022-12-20 19:13:48 +01:00
Ondřej Surý	52307f8116	Add internal logging functions to the netmgr Add internal logging functions isc__netmgr_log, isc__nmsocket_log(), and isc__nmhandle_log() that can be used to add logging messages to the netmgr, and change all direct use of isc_log_write() to use those logging functions to properly prefix them with netmgr, nmsocket and nmsocket+nmhandle.	2022-12-14 19:34:48 +01:00
Artem Boldariev	bed5e2bb08	TLS: check for sock->recv_cb when handling received data This commit adds a check if 'sock->recv_cb' might have been nullified during the call to 'sock->recv_cb'. That could happen, e.g. by an indirect call to 'isc_nmhandle_close()' from within the callback when wrapping up. In this case, let's close the TLS connection.	2022-12-02 13:20:37 +02:00
Artem Boldariev	8b7e123528	DoH: Avoid accessing non-atomic listener socket flags when accepting This commit ensures that the non-atomic flags inside a DoH listener socket object (and associated worker) are accessed when doing accept for a connection only from within the context of the dedicated thread, but not other worker threads. The purpose of this commit is to avoid TSAN errors during isc__nmsocket_closing() calls. It is a continuation of `4b5559cd8f`.	2022-12-02 12:16:12 +02:00
Artem Boldariev	4d0c226375	TLS: Avoid accessing non-atomic listener socket flags during HS This commit ensures that the non-atomic flags inside a TLS listener socket object (and associated worker) are accessed when doing handshake for a connection only from within the context of the dedicated thread, but not other worker threads. The purpose of this commit is to avoid TSAN errors during isc__nmsocket_closing() calls. It is a continuation of `4b5559cd8f`.	2022-12-02 12:16:12 +02:00
Artem Boldariev	4b5559cd8f	TLS: Avoid accessing listener socket flags from other threads This commit ensures that the flags inside a TLS listener socket object (and associated worker) are accessed when accepting a connection only from within the context of the dedicated thread, but not other worker threads.	2022-12-01 21:07:49 +02:00
Ondřej Surý	e3c628d562	Honour single read per client isc_nm_read() call in the TLSDNS The TLSDNS transport was not honouring the single read callback for TLSDNS client. It would call the read callbacks repeatedly in case the single TLS read would result in multiple DNS messages in the decoded buffer.	2022-12-01 18:31:05 +01:00
Artem Boldariev	2bfc079946	TLS stream: always handle send callbacks asynchronously This commit ensures that send callbacks are always called from within the context of its worker thread even in the case of shuttigdown/inactive socket, just like TCP transport does and with which TLS attempts to be as compatible as possible.	2022-11-30 18:09:52 +02:00
Artem Boldariev	ef659365ce	TLS Stream: use ISC_R_CANCELLED error when shutting down This commit changes ISC_R_NOTCONNECTED error code to ISC_R_CANCELLED when attempting to start reading data on the shutting down socket in order to make its behaviour compatible with that of TCP and not break the common code in the unit tests.	2022-11-30 18:09:52 +02:00
Artem Boldariev	fb9955a372	TLS Stream: fix isc_nm_read_stop() and reading flags handling It turned out that after the latest Network Manager refactoring 'sock->reading' flag was not processed correctly. Due to this isc_nm_read_stop() might not work as expected because reading from the underlying TCP socket could have been resume in 'tls_do_bio()' regardless of the 'sock->reading' value. This bug did not seem to cause problems with DoH, so it was not noticed, but Stream DNS has more strict expectations regarding the underlying transport. Additionally to the above, the 'sock->recv_read' flag was completely ignored and corresponding logic was completely unimplemented. That did not allow to implement one fine detail compared to TCP: once reading is started, it could be satisfied by one datum reading. This commit fixes the issues above.	2022-11-30 18:09:52 +02:00
Artem Boldariev	9b1c8c03fd	TCP: use uv_try_write() to optimise sends This commit make TCP code use uv_try_write() on best effort basis, just like TCP DNS and TLS DNS code does. This optimisation was added in 'caa5b6548a11da6ca772d6f7e10db3a164a18f8d' but, similar change was mistakenly omitted for generic TCP code. This commit fixes that.	2022-11-29 13:41:10 +02:00
Michal Nowak	afdb41a5aa	Update sources to Clang 15 formatting	2022-11-29 08:54:34 +01:00
Ondřej Surý	f3004da3a5	Make the netmgr send callback to be asynchronous only when needed Previously, the send callback would be synchronous only on success. Add an option (similar to what other callbacks have) to decide whether we need the asynchronous send callback on a higher level. On a general level, we need the asynchronous callbacks to happen only when we are invoking the callback from the public API. If the path to the callback went through the libuv callback or netmgr callback, we are already on asynchronous path, and there's no need to make the call to the callback asynchronous again. For the send callback, this means we need the asynchronous path for failure paths inside the isc_nm_send() (which calls isc__nm_udp_send(), isc__nm_tcp_send(), etc...) - all other invocations of the send callback could be synchronous, because those are called from the respective libuv send callbacks.	2022-11-25 15:46:25 +01:00
Ondřej Surý	5ca49942a3	Make the netmgr read callback to be asynchronous only when needed Previously, the read callback would be synchronous only on success or timeout. Add an option (similar to what other callbacks have) to decide whether we need the asynchronous read callback on a higher level. On a general level, we need the asynchronous callbacks to happen only when we are invoking the callback from the public API. If the path to the callback went through the libuv callback or netmgr callback, we are already on asynchronous path, and there's no need to make the call to the callback asynchronous again. For the read callback, this means we need the asynchronous path for failure paths inside the isc_nm_read() (which calls isc__nm_udp_read(), isc__nm_tcp_read(), etc...) - all other invocations of the read callback could be synchronous, because those are called from the respective libuv or netmgr read callbacks.	2022-11-25 15:46:15 +01:00
Evan Hunt	67c0128ebb	Fix an error when building with --disable-doh The netievent handler for isc_nmsocket_set_tlsctx() was inadvertently ifdef'd out when BIND was built with --disable-doh, resulting in an assertion failure on startup when DoT was configured.	2022-10-24 13:54:39 -07:00
Artem Boldariev	09dcc914b4	TLS Stream: handle successful TLS handshake after listener shutdown It was possible that accept callback can be called after listener shutdown. In such a case the callback pointer equals NULL, leading to segmentation fault. This commit fixes that.	2022-10-18 18:30:24 +03:00
Artem Boldariev	5ab2c0ebb3	Synchronise stop listening operation for multi-layer transports This commit introduces a primitive isc__nmsocket_stop() which performs shutting down on a multilayered socket ensuring the proper order of the operations. The shared data within the socket object can be destroyed after the call completed, as it is guaranteed to not be used from within the context of other worker threads.	2022-10-18 12:06:00 +03:00
Tony Finch	a34a2784b1	De-duplicate some calls to strerror_r() Specifically, when reporting an unexpected or fatal error.	2022-10-17 11:58:26 +01:00
Artem Boldariev	d62eb206f7	Fix isc_nmsocket_set_tlsctx() During loop manager refactoring isc_nmsocket_set_tlsctx() was not properly adapted. The function is expected to broadcast the new TLS context for every worker, but this behaviour was accidentally broken.	2022-10-14 23:06:31 +03:00
Ondřej Surý	b6b7a6886a	Don't set load-balancing socket option on the UDP connect sockets The isc_nm_udpconnect() erroneously set the reuse port with load-balancing on the outgoing connected UDP sockets. This socket option makes only sense for the listening sockets. Don't set the load-balancing reuse port option on the outgoing UDP sockets.	2022-10-12 15:36:25 +02:00
Artem Boldariev	eaebb92f3e	TLS DNS: fix certificate verification error message reporting This commit fixes TLS DNS verification error message reporting which we probably broke during one of the recent networking code refactorings. This prevent e.g. dig from producing useful error messages related to TLS certificates verification.	2022-10-12 16:24:04 +03:00
Artem Boldariev	6789b88d25	TLS: clear error queue before doing IO or calling SSL_get_error() Ensure that TLS error is empty before calling SSL_get_error() or doing SSL I/O so that the result will not get affected by prior error statuses. In particular, the improper error handling led to intermittent unit test failure and, thus, could be responsible for some of the system test failures and other intermittent TLS-related issues. See here for more details: https://www.openssl.org/docs/man3.0/man3/SSL_get_error.html In particular, it mentions the following: > The current thread's error queue must be empty before the TLS/SSL > I/O operation is attempted, or SSL_get_error() will not work > reliably. As we use the result of SSL_get_error() to decide on I/O operations, we need to ensure that it works reliably by cleaning the error queue. TLS DNS: empty error queue before attempting I/O	2022-10-12 16:24:04 +03:00
Aram Sargsyan	be95ba0119	Remove a superfluous check of sock->fd against -1 The check is left from when tcp_connect_direct() called isc__nm_socket() and it was uncertain whether it had succeeded, but now isc__nm_socket() is called before tcp_connect_direct(), so sock->fd cannot be -1. *** CID 357292: (REVERSE_NEGATIVE) /lib/isc/netmgr/tcp.c: 309 in isc_nm_tcpconnect() 303 304 atomic_store(&sock->active, true); 305 306 result = tcp_connect_direct(sock, req); 307 if (result != ISC_R_SUCCESS) { 308 atomic_store(&sock->active, false); >>> CID 357292: (REVERSE_NEGATIVE) >>> You might be using variable "sock->fd" before verifying that it is >= 0. 309 if (sock->fd != (uv_os_sock_t)(-1)) { 310 isc__nm_tcp_close(sock); 311 } 312 isc__nm_connectcb(sock, req, result, true); 313 } 314	2022-10-12 08:21:35 +00:00
Ondřej Surý	c1d26b53eb	Add and use semantic patch to replace isc_mem_get/allocate+memset Add new semantic patch to replace the straightfoward uses of: ptr = isc_mem_{get,allocate}(..., size); memset(ptr, 0, size); with the new API call: ptr = isc_mem_{get,allocate}x(..., size, ISC_MEM_ZERO);	2022-10-05 16:44:05 +02:00
Ondřej Surý	173c352452	Call the isc__nm_udp_send() callbacks asynchronously on shutdown The isc__nm_udp_send() callback would be called synchronously when shutting down or when the socket has been closed. This could lead to double locking in the calling code and thus those callbacks needs to be called asynchronously.	2022-09-29 11:06:58 +02:00
Ondřej Surý	0086ebf3fc	Bump the libuv requirement to libuv >= 1.34.0 By bumping the minimum libuv version to 1.34.0, it allows us to remove all libuv shims we ever had and makes the code much cleaner. The up-to-date libuv is available in all distributions supported by BIND 9.19+ either natively or as a backport.	2022-09-27 17:09:10 +02:00
Ondřej Surý	fffd444440	Cleanup the asychronous code in the stream implementations After the loopmgr work has been merged, we can now cleanup the TCP and TLS protocols a little bit, because there are stronger guarantees that the sockets will be kept on the respective loops/threads. We only need asynchronous call for listening sockets (start, stop) and reading from the TCP (because the isc_nm_read() might be called from read callback again. This commit does the following changes (they are intertwined together): 1. Cleanup most of the asynchronous events in the TCP code, and add comments for the events that needs to be kept asynchronous. 2. Remove isc_nm_resumeread() from the netmgr API, and replace isc_nm_resumeread() calls with existing isc_nm_read() calls. 3. Remove isc_nm_pauseread() from the netmgr API, and replace isc_nm_pauseread() calls with a new isc_nm_read_stop() call. 4. Disable the isc_nm_cancelread() for the streaming protocols, only the datagram-like protocols can use isc_nm_cancelread(). 5. Add isc_nmhandle_close() that can be used to shutdown the socket earlier than after the last detach. Formerly, the socket would be closed only after all reading and sending would be finished and the last reference would be detached. The new isc_nmhandle_close() can be used to close the underlying socket earlier, so all the other asynchronous calls would call their respective callbacks immediately. Co-authored-by: Ondřej Surý <ondrej@isc.org> Co-authored-by: Artem Boldariev <artem@isc.org>	2022-09-22 14:51:15 +02:00
Ondřej Surý	f6e4f620b3	Use the semantic patch to do the unsigned -> unsigned int change Apply the semantic patch on the whole code base to get rid of 'unsigned' usage in favor of explicit 'unsigned int'.	2022-09-19 15:56:02 +02:00
Ondřej Surý	b1026dd4c1	Add missing isc_refcount_destroy() for isc__nmsocket_t The destructor for the isc__nmsocket_t was missing call to the isc_refcount_destroy() on the reference counter, which might lead to spurious ThreadSanitizer data race warnings if we ever change the acquire-release memory order in the isc_refcount_decrement().	2022-09-19 14:38:56 +02:00
Ondřej Surý	9b8d432403	Reorder the uv_close() calls to close the socket immediately Simplify the closing code - during the loopmgr implementation, it was discovered that the various lists used by the uv_loop_t aren't FIFO, but LIFO. See doc/dev/libuv.md for more details. With this knowledge, we can close the protocol handles (uv_udp_t and uv_tcp_t) and uv_timer_t at the same time by reordering the uv_close() calls, and thus making sure that after calling the isc__nm_stoplistening(), the code will not issue any additional callback calls (accept, read) on the socket that stopped listening. This might help with the TLS and DoH shutting down sequence as described in the [GL #3509] as we now stop the reading, stop the timer and call the uv_close() as earliest as possible.	2022-09-19 14:38:56 +02:00
Ondřej Surý	eac8bc5c1a	Prevent unexpected UDP client read callbacks The network manager UDP code was misinterpreting when the libuv called the udp_recv_cb with nrecv == 0 and addr == NULL -> this doesn't really mean that the "stream" has ended, but the libuv indicates that the receive buffer can be freed. This could lead to assertion failure in the code that calls isc_nm_read() from the network manager read callback due to the extra spurious callbacks. Properly handle the extra callback calls from the libuv in the client read callback, and refactor the UDP isc_nm_read() implementation to be synchronous, so no datagram is lost between the time that we stop the reading from the UDP socket and we restart it again in the asychronous udpread event. Add a unit test that tests the isc_nm_read() call from the read callback to receive two datagrams.	2022-09-19 12:20:41 +02:00
Michał Kępień	4c49068531	Fix building with --disable-doh Commit `b69e783164` inadvertently caused builds using the --disable-doh switch to fail, by putting the declaration of the isc__nm_async_settlsctx() function inside an #ifdef block that is only evaluated when DNS-over-HTTPS support is enabled. This results in the following compilation errors being triggered: netmgr/netmgr.c:2657:1: error: no previous prototype for 'isc__nm_async_settlsctx' [-Werror=missing-prototypes] 2657 \| isc__nm_async_settlsctx(isc__networker_t worker, isc__netievent_t ev0) { \| ^~~~~~~~~~~~~~~~~~~~~~~ Fix by making the declaration of the isc__nm_async_settlsctx() function in lib/isc/netmgr/netmgr-int.h visible regardless of whether DNS-over-HTTPS support is enabled or not.	2022-09-07 12:50:08 +02:00
Aram Sargsyan	2f11e48f0d	Fix isc_nm_listentlsdns() error path bug The isc_nm_listentlsdns() function erroneously calls isc__nm_tcpdns_stoplistening() instead of isc__nm_tlsdns_stoplistening() when something goes wrong, which can cause an assertion failure.	2022-09-05 14:58:52 +00:00
Ondřej Surý	718e92c31a	Clear the callbacks when isc_nm_stoplistening() is called When we are closing the listening sockets, there's a time window in which the TCP connection could be accepted although the respective stoplistening function has already returned to control to the caller. Clear the accept callback function early, so it doesn't get called when we are not interested in the incoming connections anymore.	2022-08-26 09:09:25 +02:00
Ondřej Surý	b69e783164	Update netmgr, tasks, and applications to use isc_loopmgr Previously: * applications were using isc_app as the base unit for running the application and signal handling. * networking was handled in the netmgr layer, which would start a number of threads, each with a uv_loop event loop. * task/event handling was done in the isc_task unit, which used netmgr event loops to run the isc_event calls. In this refactoring: * the network manager now uses isc_loop instead of maintaining its own worker threads and event loops. * the taskmgr that manages isc_task instances now also uses isc_loopmgr, and every isc_task runs on a specific isc_loop bound to the specific thread. * applications have been updated as necessary to use the new API. * new ISC_LOOP_TEST macros have been added to enable unit tests to run isc_loop event loops. unit tests have been updated to use this where needed.	2022-08-26 09:09:24 +02:00
Artem Boldariev	32565d0d65	TLS: do not ignore readpaused flag in certain circumstances In some circumstances generic TLS code could have resumed data reading unexpectedly on the TCP layer code. Due to this, the behaviour of isc_nm_pauseread() and isc_nm_resumeread() might have been unexpected. This commit fixes that. The bug does not seems to have real consequences in the existing code due to the way the code is used. However, the bug could have lead to unexpected behaviour and, at any rate, makes the TLS code behave differently from the TCP code, with which it attempts to be as compatible as possible.	2022-08-02 14:02:01 +03:00
Artem Boldariev	c52c691b18	TLS: fix double resumption in isc__nm_tls_resumeread() This commit fixes an obvious error in isc__nm_tls_resumeread() so that read cannot be resumed twice.	2022-07-26 14:25:59 +03:00
Artem Boldariev	5d450cd0ba	TLS: clear 'errno' when handling SSL status Sometimes tls_do_bio() might be called when there is no new data to process (most notably, when resuming reads), in such a case internal TLS session state will remain untouched and old value in 'errno' will alter the result of SSL_get_error() call, possibly making it to return SSL_ERROR_SYSCALL. This value will be treated as an error, and will lead to closing the connection, which is not what expected.	2022-07-26 14:25:59 +03:00
Ondřej Surý	3e10d3b45f	Cleanup the STATID_CONNECT and STATID_CONNECTFAIL stat counters The STATID_CONNECT and STATID_CONNECTFAIL statistics were used incorrectly. The STATID_CONNECT was incremented twice (once in the *_connect_direct() and once in the callback) and STATID_CONNECTFAIL would not be incremented at all if the failure happened in the callback. Closes: #3452	2022-07-14 14:34:53 +02:00
Ondřej Surý	a280855f7b	Handle the transient TCP connect() failures on FreeBSD On FreeBSD (and perhaps other *BSD) systems, the TCP connect() call (via uv_tcp_connect()) can fail with transient UV_EADDRINUSE error. The UDP code already handles this by trying three times (is a charm) before giving up. Add a code for the TCP, TCPDNS and TLSDNS layers to also try three times before giving up by calling uv_tcp_connect() from the callback two more time on UV_EADDRINUSE error. Additionally, stop the timer only if we succeed or on hard error via isc__nm_failed_connect_cb().	2022-07-14 14:20:10 +02:00
Artem Boldariev	ffcb54211e	TLS: do not ignore accept callback result Before this change the TLS code would ignore the accept callback result, and would not try to gracefully close the connection. This had not been noticed, as it is not really required for DoH. Now the code tries to shut down the TLS connection gracefully when accepting it is not successful.	2022-07-12 14:40:22 +03:00
Artem Boldariev	8585b92f98	TLSDNS: try pass incoming data to OpenSSL if there are any Otherwise the code path will lead to a call to SSL_get_error() returning SSL_ERROR_SSL, which in turn might lead to closing connection to early in an unexpected way, as it is clearly not what is intended. The issue was found when working on loppmgr branch and appears to be timing related as well. Might be responsible for some unexpected transmission failures e.g. on zone transfers.	2022-07-12 14:40:22 +03:00
Artem Boldariev	fc74b15e67	TLS: bail out earlier when NM is stopping In some operations - most prominently when establishing connection - it might be beneficial to bail out earlier when the network manager is stopping. The issue is backported from loopmgr branch, where such a change is not only beneficial, but required.	2022-07-12 14:40:22 +03:00
Artem Boldariev	ac4fb34f18	TLS: sometimes TCP conn. handle might be NULL on when connecting In some cases - in particular, in case of errors, NULL might be passed to a connection callback instead of a handle that could have led to an abort. This commit ensures that such a situation will not occur. The issue was found when working on the loopmgr branch.	2022-07-12 14:40:22 +03:00
Artem Boldariev	88524e26ec	TLS: try to close sockets whenever there are no pending operations This commit ensures that the underlying TCP socket of a TLS connection gets closed earlier whenever there are no pending operations on it. In the loop-manager branch, in some circumstances the connection could have remained opened for far too long for no reason. This commit ensures that will not happen.	2022-07-12 14:40:22 +03:00
Artem Boldariev	237ce05b89	TLS: Implement isc_nmhandle_setwritetimeout() This commit adds a proper implementation of isc_nmhandle_setwritetimeout() for TLS connections. Now it passes the value to the underlying TCP handle.	2022-07-12 14:40:22 +03:00
Evan Hunt	a499794984	REQUIRE should not have side effects it's a style violation to have REQUIRE or INSIST contain code that must run for the server to work. this was being done with some atomic_compare_exchange calls. these have been cleaned up. uses of atomic_compare_exchange in assertions have been replaced with a new macro atomic_compare_exchange_enforced, which uses RUNTIME_CHECK to ensure that the exchange was successful.	2022-07-05 12:22:55 -07:00
Artem Boldariev	d2e13ddf22	Update the set of HTTP endpoints on reconfiguration This commit ensures that on reconfiguration the set of HTTP endpoints (=paths) is being updated within HTTP listeners.	2022-06-28 15:42:38 +03:00
Artem Boldariev	e72962d5f1	Update max concurrent streams limit in HTTP listeners on reconfig This commit ensures that HTTP listeners concurrent streams limit gets updated properly on reconfiguration.	2022-06-28 15:42:38 +03:00
Artem Boldariev	e616d7f240	TLS DNS: do not call accept callback twice Before the changes from this commit were introduced, the accept callback function will get called twice when accepting connection during two of these stages: * when accepting the TCP connection; * when handshake has completed. That is clearly an error, as it should have been called only once. As far as I understand it the mistake is a result of TLS DNS transport being essentially a fork of TCP transport, where calling the accept callback immediately after accepting TCP connection makes sense. This commit fixes this mistake. It did not have any very serious consequences because in BIND the accept callback only checks an ACL and updates stats.	2022-06-15 14:21:11 +03:00
Ondřej Surý	b432d5d3bc	Gracefully handle uv_read_start() failures Under specific rare timing circumstances the uv_read_start() could fail with UV_EINVAL when the connection is reset between the connect (or accept) and the uv_read_start() call on the nmworker loop. Handle such situation gracefully by propagating the errors from uv_read_start() into upper layers, so the socket can be internally closed().	2022-06-14 11:33:02 +02:00
Artem Boldariev	9abb00bb5f	Fix an abort in DoH (client-side) when writing on closing sock The commit fixes a corner case in client-side DoH code, when a write attempt is done on a closing socket (session). The change ensures that the write call-back will be called with a proper error code (see failed_send_cb() call in client_httpsend()).	2022-05-20 20:18:40 +03:00
Artem Boldariev	245f7cec2e	Avoid aborting when uv_timer_start() is used on a closing socket In such a case it will return UV_EINVAL (-EINVAL), leading to aborting, as the code expects the function to succeed.	2022-05-20 20:18:40 +03:00
Artem Boldariev	86465c1dac	DoT: implement TLS client session resumption This commit extends DoT code with TLS client session resumption support implemented on top of the TLS client session cache.	2022-05-20 20:17:48 +03:00
Artem Boldariev	90bc13a5d5	TLS stream/DoH: implement TLS client session resumption This commit extends TLS stream code and DoH code with TLS client session resumption support implemented on top of the TLS client session cache.	2022-05-20 20:17:45 +03:00
Ondřej Surý	61117840c1	Move setting the sock->write_timeout to the async_*send Setting the sock->write_timeout from the TCP, TCPDNS, and TLSDNS send functions could lead to (harmless) data race when setting the value for the first time when the isc_nm_send() function would be called from thread not-matching the socket we are sending to. Move the setting the sock->write_timeout to the matching async function which is always called from the matching thread.	2022-05-19 22:36:47 +02:00
Michal Nowak	c9aca34b1e	BIND 9.19.1 -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEIz+ZTe/bbr1Q+/5RJKPoRjruXlYFAmJ42nYACgkQJKPoRjru XlYtjA/8Dm/V5GSluMEoOiYXzqJ48qkdZk9sGpj+nlrnKSGY9UISZdB+9cc9JsvG D8c0a4JVYy4+Rcu6ivTc/iL7jrS7ypg5FFRFxOrWEugmCyOEJQ8tnhjvtQpzWyce m3PHtPn8s5HBojfmW4DJG5A+1CtbzStzGGdtZY6+uE9LcXynDyIjf0ebrYn7prVH E3UC+cYOMhq/v9AsOBvphc/3KpEWkTLeYLknPzD4el1MpCX7bTvEgnOPE8RgeVtm SGkXoEn2+EvfJf0UMJU6i4gqKJ4HFG2gwqk7H5XmEi61U3qerAExqgz81r9/pFzC PupeB7qjtHB0QO1QN3q++CW9sQJ4Xy0BrbcDWe0dgY7Kt8UgrM+CDV+qm4ueryem d6gqmT1WKFeS2NevHPnOoqoSJa2IhEWR07/DoZVUXF0ADtFeswANaRVDTv+fGy1j qKKPwoLndYePJROuQ296xntyK4A7E4lNkwdP76/x1I0vhqdRoMZNP2l2e7s1uznL O8FP6yBov2EopIoGRfmrSFVUdkGn4gPzx4M5DHYhgsI+S2TXpXVyJq0XcEvEE3S6 bMYCHU3yR8EExvKdFxcshxJMhkezF8OvxRxKp3Vap5ClFagg+sAnI0wv5GsmxKgq RVzFKyuTtZisfV9a3rC5TxBtjmnMPcWuI9kj09VPlzqKh9xibhU= =Im1y -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEIz+ZTe/bbr1Q+/5RJKPoRjruXlYFAmKGBicACgkQJKPoRjru XlbjRxAAk+2JaH7/lffXyCgcCj1A75AUlS+KNnP0NwtNTMvcvDXfI5R3HYW0ZfXg ITlixiIyH1la029vUuyr7pYwLhM8H7ocqkmsMDh7JqhaM8DDVCUSEeBGU+dZJKbs IBsBgQ0I5vsD4UIiyW/1LuI05GfmFA0Ood8meIZMZ176le0M7NsWQnawZVSsY2f1 u+r6Ca50XIPrF1J5tUk1Dmj0aPPIVSSMmcn3+ZChTyiilUegrBjv1jKKqkf6+Kgi vMIqZLMTtJluzPkxTUZ2kQDfCtzFM3kijAWPko1Zcybxq2OsKT5hSGFkoFo/afF4 pmk8XzGdSII+DYfiBUU2ddt3NS7htbWgf6vfSa/oXUZXqvv8V0eYUn5A0wIw46w3 gT6ut4BDLZ8Hl32rbuXJ0RgzVnD+0GFpkqpl9okwz9E5nbj18+CXWMRLdCUktxyK ZjnbiW0luuOmwSEyzA2jfNOcqbgElmCfmeJhUSWbSlt1u9k/bTms9NRjNM4MRy+r c7VjAEPwAzNugf4B3uZ+ObaGwAsUTBooOxXdwiHtpRAU8hSHhIVNBMRCtNzCz0dZ Wwd87eF7KqsKnikkm8qajvZUACty7DklDiODV8j+Ir/JXpZgGn0jqTyo5T/dueQq s6448xoLbVTBRtvtuAWZX95EmWDLdiizqn3HaDaHOxYXzQO5OhY= =Kjz0 -----END PGP SIGNATURE----- Merge tag 'v9_19_1' BIND 9.19.1	2022-05-19 10:55:42 +02:00
Evan Hunt	6936db2f59	Always use the number of CPUS for resolver->ntasks Since the fctx hash table is now self-resizing, and resolver tasks are selected to match the thread that created the fetch context, there shouldn't be any significant advantage to having multiple tasks per CPU; a single task per thread should be sufficient. Additionally, the fetch context is always pinned to the calling netmgr thread to minimize the contention just to coalesced fetches - if two threads starts the same fetch, it will be pinned to the first one to get the bucket.	2022-05-19 09:27:33 +02:00
Artem Boldariev	a696be6a2d	Fix a crash by avoiding destroying TLS stream socket too early This commit fixes a crash in generic TLS stream code, which could be reproduced during some runs of the 'sslyze' tool. The intention of this commit is twofold. Firstly, it ensures that the TLS socket object cannot be destroyed too early. Now it is being deleted alongside the underlying TCP socket object. Secondly, it ensures that the TLS socket object cannot be destroyed as a result of calling 'tls_do_bio()' (the primary function which performs encryption/decryption during the IO) as the code did not expect that. This code path is fixed now.	2022-05-04 19:38:16 +02:00
Ondřej Surý	b43812692d	Move netmgr/uv-compat.h to <isc/uv.h> As we are going to use libuv outside of the netmgr, we need the shims to be readily available for the rest of the codebase. Move the "netmgr/uv-compat.h" to <isc/uv.h> and netmgr/uv-compat.c to uv.c, and as a rule of thumb, the users of libuv should include <isc/uv.h> instead of <uv.h> directly. Additionally, merge netmgr/uverr2result.c into uv.c and rename the single function from isc__nm_uverr2result() to isc_uverr2result().	2022-05-03 10:02:19 +02:00
Ondřej Surý	24c3879675	Move socket related functions to netmgr/socket.c Move the netmgr socket related functions from netmgr/netmgr.c and netmgr/uv-compat.c to netmgr/socket.c, so they are all present all in the same place. Adjust the names of couple interal functions accordingly.	2022-05-03 09:52:49 +02:00
Tony Finch	66b3cb9732	Remove several superfluous newlines in log messages	2022-05-02 23:49:38 +01:00
Artem Boldariev	978f97dcdd	TLSDNS: call send callbacks after only the data was sent This commit ensures that write callbacks are getting called only after the data has been sent via the network. Without this fix, a situation could appear when a write callback could get called before the actual encrypted data would have been sent to the network. Instead, it would get called right after it would have been passed to the OpenSSL (i.e. encrypted). Most likely, the issue does not reveal itself often because the callback call was asynchronous, so in most cases it should have been called after the data has been sent, but that was not guaranteed by the code logic. Also, this commit removes one memory allocation (netievent) from a hot path, as there is no need to call this callback asynchronously anymore.	2022-04-27 17:44:23 +03:00
Ondřej Surý	407b37c3f2	Set IP(V6)_RECVERR on connect UDP sockets (via libuv) The connect()ed UDP socket provides feedback on a variety of ICMP errors (eg port unreachable) which bind can then use to decide what to do with errors (report them to the client, try again with a different nameserver etc). However, Linux's implementation does not report what it considers "transient" conditions, which is defined as Destination host Unreachable, Destination network unreachable, Source Route Failed and Message Too Big. Explicitly enable IP_RECVERR / IPV6_RECVERR (via libuv uv_udp_bind() flag) to learn about ICMP destination network/host unreachable.	2022-04-26 12:22:18 +02:00
Ondřej Surý	eb8f2974b1	Abort when libuv at runtime mismatches libuv at compile time When we compile with libuv that has some capabilities via flags passed to f.e. uv_udp_listen() or uv_udp_bind(), the call with such flags would fail with invalid arguments when older libuv version is linked at the runtime that doesn't understand the flag that was available at the compile time. Enforce minimal libuv version when flags have been available at the compile time, but are not available at the runtime. This check is less strict than enforcing the runtime libuv version to be same or higher than compile time libuv version.	2022-04-26 11:40:40 +02:00
Ondřej Surý	f55a4d3e55	Allow listening on less than nworkers threads For some applications, it's useful to not listen on full battery of threads. Add workers argument to all isc_nm_listen*() functions and convenience ISC_NM_LISTEN_ONE and ISC_NM_LISTEN_ALL macros.	2022-04-19 11:08:13 +02:00
Artem Boldariev	df317184eb	Add isc_nmsocket_set_tlsctx() This commit adds isc_nmsocket_set_tlsctx() - an asynchronous function that replaces the TLS context within a given TLS-enabled listener socket object. It is based on the newly added reference counting functionality. The intention of adding this function is to add functionality to replace a TLS context without recreating the whole socket object, including the underlying TCP listener socket, as a BIND process might not have enough permissions to re-create it fully on reconfiguration.	2022-04-06 18:45:57 +03:00
Artem Boldariev	25609156a5	Maintain a per-thread TLS ctx reference in TLS stream code This commit changes the generic TLS stream code to maintain a per-worker thread TLS context reference.	2022-04-06 18:45:57 +03:00
Artem Boldariev	9256026d18	Use isc_tlsctx_attach() in TLS DNS code This commit adds proper reference counting for TLS contexts into generic TLS DNS (DoT) code.	2022-04-06 18:45:57 +03:00
Artem Boldariev	b52d46612f	Use isc_tlsctx_attach() in TLS stream code This commit adds proper reference counting for TLS contexts into generic TLS stream code.	2022-04-06 18:45:57 +03:00
Ondřej Surý	142c63dda8	Enable the load-balance-sockets configuration Previously, HAVE_SO_REUSEPORT_LB has been defined only in the private netmgr-int.h header file, making the configuration of load balanced sockets inoperable. Move the missing HAVE_SO_REUSEPORT_LB define the isc/netmgr.h and add missing isc_nm_getloadbalancesockets() implementation.	2022-04-05 01:30:58 +02:00
Ondřej Surý	85c6e797aa	Add option to configure load balance sockets Previously, the option to enable kernel load balancing of the sockets was always enabled when supported by the operating system (SO_REUSEPORT on Linux and SO_REUSEPORT_LB on FreeBSD). It was reported that in scenarios where the networking threads are also responsible for processing long-running tasks (like RPZ processing, CATZ processing or large zone transfers), this could lead to intermitten brownouts for some clients, because the thread assigned by the operating system might be busy. In such scenarious, the overall performance would be better served by threads competing over the sockets because the idle threads can pick up the incoming traffic. Add new configuration option (`load-balance-sockets`) to allow enabling or disabling the load balancing of the sockets.	2022-04-04 23:10:04 +02:00
Ondřej Surý	30e0fd942b	Remove task privileged mode Previously, the task privileged mode has been used only when the named was starting up and loading the zones from the disk as the "first" thing to do. The privileged task was setup with quantum == 2, which made the taskmgr/netmgr spin around the privileged queue processing two events at the time. The same effect can be achieved by setting the quantum to UINT_MAX (e.g. practically unlimited) for the loadzone task, hence the privileged task mode was removed in favor of just processing all the events on the loadzone task in a single task_run().	2022-04-01 23:55:26 +02:00
Ondřej Surý	4dceab142d	Consistenly use UNREACHABLE() instead of ISC_UNREACHABLE() In couple places, we have missed INSIST(0) or ISC_UNREACHABLE() replacement on some branches with UNREACHABLE(). Replace all ISC_UNREACHABLE() or INSIST(0) calls with UNREACHABLE().	2022-03-28 23:26:08 +02:00
Artem Boldariev	783663db80	Add ISC_R_TLSBADPEERCERT error code to the TLS related code This commit adds support for ISC_R_TLSBADPEERCERT error code, which is supposed to be used to signal for TLS peer certificates verification in dig and other code. The support for this error code is added to our TLS and TLS DNS implementations. This commit also adds isc_nm_verify_tls_peer_result_string() function which is supposed to be used to get a textual description of the reason for getting a ISC_R_TLSBADPEERCERT error.	2022-03-28 15:32:30 +03:00
Ondřej Surý	9de10cd153	Remove extrahandle size from netmgr Previously, it was possible to assign a bit of memory space in the nmhandle to store the client data. This was complicated and prevents further refactoring of isc_nmhandle_t caching (future work). Instead of caching the data in the nmhandle, allocate the hot-path ns_client_t objects from per-thread clientmgr memory context and just assign it to the isc_nmhandle_t via isc_nmhandle_set().	2022-03-25 10:38:35 +01:00
Ondřej Surý	584f0d7a7e	Simplify way we tag unreachable code with only ISC_UNREACHABLE() Previously, the unreachable code paths would have to be tagged with: INSIST(0); ISC_UNREACHABLE(); There was also older parts of the code that used comment annotation: /* NOTREACHED */ Unify the handling of unreachable code paths to just use: UNREACHABLE(); The UNREACHABLE() macro now asserts when reached and also uses __builtin_unreachable(); when such builtin is available in the compiler.	2022-03-25 08:33:43 +01:00
Ondřej Surý	fe7ce629f4	Add FALLTHROUGH macro for __attribute__((fallthrough)) Gcc 7+ and Clang 10+ have implemented __attribute__((fallthrough)) which is explicit version of the /* FALLTHROUGH / comment we are currently using. Add and apply FALLTHROUGH macro that uses the attribute if available, but does nothing on older compilers. In one case (lib/dns/zone.c), using the macro revealed that we were using the / FALLTHROUGH */ comment in wrong place, remove that comment.	2022-03-25 08:33:43 +01:00
Ondřej Surý	d70daa29f7	Make netmgr the authority on number of threads running Instead of passing the "workers" variable back and forth along with passing the single isc_nm_t instance, add isc_nm_getnworkers() function that returns the number of netmgr threads are running. Change the ns_interfacemgr and ns_taskmgr to utilize the newly acquired knowledge.	2022-03-18 21:53:28 +01:00
Ondřej Surý	bfa4b9c141	Run .closehandle_cb asynchrounosly in nmhandle_detach_cb() When sock->closehandle_cb is set, we need to run nmhandle_detach_cb() asynchronously to ensure correct order of multiple packets processing in the isc__nm_process_sock_buffer(). When not run asynchronously, it would cause: a) out-of-order processing of the return codes from processbuffer(); b) stack growth because the next TCP DNS message read callback will be called from within the current TCP DNS message read callback. The sock->closehandle_cb is set to isc__nm_resume_processing() for TCP sockets which calls isc__nm_process_sock_buffer(). If the read callback (called from isc__nm_process_sock_buffer()->processbuffer()) doesn't attach to the nmhandle (f.e. because it wants to drop the processing or we send the response directly via uv_try_write()), the isc__nm_resume_processing() (via .closehandle_cb) would call isc__nm_process_sock_buffer() recursively. The below shortened code path shows how the stack can grow: 1: ns__client_request(handle, ...); 2: isc_nm_tcpdns_sequential(handle); 3: ns_query_start(client, handle); 4: query_lookup(qctx); 5: query_send(qctcx->client); 6: isc__nmhandle_detach(&client->reqhandle); 7: nmhandle_detach_cb(&handle); 8: sock->closehandle_cb(sock); // isc__nm_resume_processing 9: isc__nm_process_sock_buffer(sock); 10: processbuffer(sock); // isc__nm_tcpdns_processbuffer 11: isc_nmhandle_attach(req->handle, &handle); 12: isc__nm_readcb(sock, req, ISC_R_SUCCESS); 13: isc__nm_async_readcb(NULL, ...); 14: uvreq->cb.recv(...); // ns__client_request Instead, if 'sock->closehandle_cb' is set, we need to run detach the handle asynchroniously in 'isc__nmhandle_detach', so that on line 8 in the code flow above does not start this recursion. This ensures the correct order when processing multiple packets in the function 'isc__nm_process_sock_buffer()' and prevents the stack growth. When not run asynchronously, the out-of-order processing leaves the first TCP socket open until all requests on the stream have been processed. If the pipelining is disabled on the TCP via `keep-response-order` configuration option, named would keep the first socket in lingering CLOSE_WAIT state when the client sends an incomplete packet and then closes the connection from the client side.	2022-03-16 22:11:49 +01:00
Ondřej Surý	6ddac2d56d	On shutdown, reset the established TCP connections Previously, the established TCP connections (both client and server) would be gracefully closed waiting for the write timeout. Don't wait for TCP connections to gracefully shutdown, but directly reset them for faster shutdown.	2022-03-11 09:56:57 +01:00
Ondřej Surý	a761aa59e3	Change single write timer to per-send timers Previously, there was a single per-socket write timer that would get restarted for every new write. This turned out to be insufficient because the other side could keep reseting the timer, and never reading back the responses. Change the single write timer to per-send timer which would in turn reset the TCP connection on the first send timeout.	2022-03-11 09:56:57 +01:00
Ondřej Surý	f251d69eba	Remove usage of deprecated ATOMIC_VAR_INIT() macro The C17 standard deprecated ATOMIC_VAR_INIT() macro (see [1]). Follow the suite and remove the ATOMIC_VAR_INIT() usage in favor of simple assignment of the value as this is what all supported stdatomic.h implementations do anyway: * MacOSX.plaform: #define ATOMIC_VAR_INIT(__v) {__v} * Gcc stdatomic.h: #define ATOMIC_VAR_INIT(VALUE) (VALUE) 1. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1138r0.pdf	2022-03-08 23:55:10 +01:00
Ondřej Surý	8098a58581	Set TCP maximum segment size to minimum size of 1220 Previously the socket code would set the TCPv6 maximum segment size to minimum value to prevent IP fragmentation for TCP. This was not yet implemented for the network manager. Implement network manager functions to set and use minimum MTU socket option and set the TCP_MAXSEG socket option for both IPv4 and IPv6 and use those to clamp the TCP maximum segment size for TCP, TCPDNS and TLSDNS layers in the network manager to 1220 bytes, that is 1280 (IPv6 minimum link MTU) minus 40 (IPv6 fixed header) minus 20 (TCP fixed header) We already rely on a similar value for UDP to prevent IP fragmentation and it make sense to use the same value for IPv4 and IPv6 because the modern networks are required to support IPv6 packet sizes. If there's need for small TCP segment values, the MTU on the interfaces needs to be properly configured.	2022-03-08 10:27:05 +01:00
Ondřej Surý	5d34a14f22	Set minimum MTU (1280) on IPv6 sockets The IPV6_USE_MIN_MTU socket option directs the IP layer to limit the IPv6 packet size to the minimum required supported MTU from the base IPv6 specification, i.e. 1280 bytes. Many implementations of TCP running over IPv6 neglect to check the IPV6_USE_MIN_MTU value when performing MSS negotiation and when constructing a TCP segment despite MSS being defined to be the MTU less the IP and TCP header sizes (60 bytes for IPv6). This leads to oversized IPv6 packets being sent resulting in unintended Path Maximum Transport Unit Discovery (PMTUD) being performed and to fragmented IPv6 packets being sent. Add and use a function to set socket option to limit the MTU on IPv6 sockets to the minimum MTU (1280) both for UDP and TCP.	2022-03-08 10:27:05 +01:00
Ondřej Surý	6bd025942c	Replace netievent lock-free queue with simple locked queue The current implementation of isc_queue uses Michael-Scott lock-free queue that in turn uses hazard pointers. It was discovered that the way we use the isc_queue, such complicated mechanism isn't really needed, because most of the time, we either execute the work directly when on nmthread (in case of UDP) or schedule the work from the matching nmthreads. Replace the current implementation of the isc_queue with a simple locked ISC_LIST. There's a slight improvement - since copying the whole list is very lightweight - we move the queue into a new list before we start the processing and locking just for moving the queue and not for every single item on the list. NOTE: There's a room for future improvements - since we don't guarantee the order in which the netievents are processed, we could have two lists - one unlocked that would be used when scheduling the work from the matching thread and one locked that would be used from non-matching thread.	2022-03-04 13:49:51 +01:00
Ondřej Surý	b220fb32bd	Handle TCP sockets in isc__nmsocket_reset() The isc__nmsocket_reset() was missing a case for raw TCP sockets (used by RNDC and DoH) which would case a assertion failure when write timeout would be triggered. TCP sockets are now also properly handled in isc__nmsocket_reset().	2022-02-28 02:06:03 -08:00
Ondřej Surý	ecf042991c	Fix typo __SANITIZE_ADDRESS -> __SANITIZE_ADDRESS__ When checking for Address Sanitizer to disable the inactivehandles caching, there was a typo in the macro.	2022-02-24 00:15:16 +01:00
Ondřej Surý	be339b3c83	Disable inactive uvreqs caching when compiled with sanitizers When isc__nm_uvreq_t gets deactivated, it could be just put onto array stack to be reused later to save some initialization time. Unfortunately, this might hide some use-after-free errors. Disable the inactive uvreqs caching when compiled with Address or Thread Sanitizer.	2022-02-24 00:15:16 +01:00
Ondřej Surý	92cce1da65	Disable inactive handles caching when compiled with sanitizers When isc_nmhandle_t gets deactivated, it could be just put onto array stack to be reused later to safe some initialization time. Unfortunately, this might hide some use-after-free errors. Disable the inactive handles caching when compiled with Address or Thread Sanitizer.	2022-02-23 23:21:29 +01:00
Ondřej Surý	e2555a306f	Remove active handles tracking from isc__nmsocket_t The isc__nmsocket_t has locked array of isc_nmhandle_t that's not used for anything. The isc__nmhandle_get() adds the isc_nmhandle_t to the locked array (and resized if necessary) and removed when isc_nmhandle_put() finally destroys the handle. That's all it does, so it serves no useful purpose. Remove the .ah_handles, .ah_size, and .ah_frees members of the isc__nmsocket_t and .ah_pos member of the isc_nmhandle_t struct.	2022-02-23 22:54:47 +01:00
Ondřej Surý	3268627916	Delay isc__nm_uvreq_t deallocation to connection callback When the TCP, TCPDNS or TLSDNS connection times out, the isc__nm_uvreq_t would be pushed into sock->inactivereqs before the uv_tcp_connect() callback finishes. Because the isc__nmsocket_t keeps the list of inactive isc__nm_uvreq_t, this would cause use-after-free only when the sock->inactivereqs is full (which could never happen because the failure happens in connection timeout callback) or when the sock->inactivereqs mechanism is completely removed (f.e. when running under Address or Thread Sanitizer). Delay isc__nm_uvreq_t deallocation to the connection callback and only signal the connection callback should be called by shutting down the libuv socket from the connection timeout callback.	2022-02-23 22:54:47 +01:00
Ondřej Surý	88418c3372	Properly free up enqueued netievents in nm_destroy() When the isc_netmgr is being destroyed, the normal and priority queues should be dequeued and netievents properly freed. This wasn't the case.	2022-02-23 22:51:12 +01:00
Ondřej Surý	d01562f22b	Remove the keep-response-order ACL map The keep-response-order option has been obsoleted, and in this commit, remove the keep-response-order ACL map rendering the option no-op, the call the isc_nm_sequential() and the now unused isc_nm_sequential() function itself.	2022-02-18 09:16:03 +01:00
Ondřej Surý	4f5b4662b6	Remove the limit on the number of simultaneous TCP queries There was an artificial limit of 23 on the number of simultaneous pipelined queries in the single TCP connection. The new network managers is capable of handling "unlimited" (limited only by the TCP read buffer size ) queries similar to "unlimited" handling of the DNS queries receive over UDP. Don't limit the number of TCP queries that we can process within a single TCP read callback.	2022-02-17 16:19:12 -08:00
Ondřej Surý	3c7b04d015	Add network manager based timer API This commits adds API that allows to create arbitrary timers associated with the network manager handles.	2022-02-17 21:38:17 +01:00
Ondřej Surý	4716c56ebb	Reset the TCP connection when garbage is received When invalid DNS message is received, there was a handling mechanism for DoH that would be called to return proper HTTP response. Reuse this mechanism and reset the TCP connection when the client is blackholed, DNS message is completely bogus or the ns_client receives response instead of query.	2022-02-17 20:39:55 +01:00
Ondřej Surý	a89d9e0fa6	Add isc_nmhandle_setwritetimeout() function In some situations (unit test and forthcoming XFR timeouts MR), we need to modify the write timeout independently of the read timeout. Add a isc_nmhandle_setwritetimeout() function that could be called before isc_nm_send() to specify a custom write timeout interval.	2022-02-17 09:06:58 +01:00
Ondřej Surý	408b362169	Add TCP, TCPDNS and TLSDNS write timer When the outgoing TCP write buffers are full because the other party is not reading the data, the uv_write() could wait indefinitely on the uv_loop and never calling the callback. Add a new write timer that uses the `tcp-idle-timeout` value to interrupt the TCP connection when we are not able to send data for defined period of time.	2022-02-17 09:06:58 +01:00
Ondřej Surý	cd3b58622c	Add uv_tcp_close_reset compat The uv_tcp_close_reset() function was added in libuv 1.32.0 and since we support older libuv releases, we have to add a shim uv_tcp_close_reset() implementation loosely based on libuv.	2022-02-17 09:06:58 +01:00
Ondřej Surý	45a73c113f	Rename sock->timer to sock->read_timer Before adding the write timer, we have to remove the generic sock->timer to sock->read_timer. We don't touch the function names to limit the impact of the refactoring.	2022-02-17 09:06:58 +01:00
Ondřej Surý	8715be1e4b	Use UV_RUNTIME_CHECK() as appropriate Replace the RUNTIME_CHECK() calls for libuv API calls with UV_RUNTIME_CHECK() to get more detailed error message when something fails and should not.	2022-02-16 11:16:57 +01:00
Ondřej Surý	62e15bb06d	Add UV_RUNTIME_CHECK() macro to print uv_strerror() When libuv functions fail, they return correct return value that could be useful for more detailed debugging. Currently, we usually just check whether the return value is 0 and invoke assertion error if it doesn't throwing away the details why the call has failed. Unfortunately, this often happen on more exotic platforms. Add a UV_RUNTIME_CHECK() macro that can be used to print more detailed error message (via uv_strerror() before ending the execution of the program abruptly with the assertion.	2022-02-16 11:16:57 +01:00
Ondřej Surý	2ae84702ad	Add log message when hard quota is reached in TCP accept When isc_quota_attach_cb() API returns ISC_R_QUOTA (meaning hard quota was reached) the accept_connection() would return without logging a message about quota reached. Change the connection callback to log the quota reached message.	2022-02-01 21:00:05 +01:00
Ondřej Surý	b5e086257d	Explicitly enable IPV6_V6ONLY on the netmgr sockets Some operating systems (OpenBSD and DragonFly BSD) don't restrict the IPv6 sockets to sending and receiving IPv6 packets only. Explicitly enable the IPV6_V6ONLY socket option on the IPv6 sockets to prevent failures from using the IPv4-mapped IPv6 address.	2022-01-17 22:16:27 +01:00
Evan Hunt	be0bc24c7f	add UV_ENOTSUP to isc___nm_uverr2result() This error code is now mapped to ISC_R_FAMILYNOSUPPORT.	2022-01-17 11:45:10 +01:00
Artem Boldariev	ca9fe3559a	DoH: ensure that server_send_error_response() is used properly The server_send_error_response() function is supposed to be used only in case of failures and never in case of legitimate requests. Ensure that ISC_HTTP_ERROR_SUCCESS is never passed there by mistake.	2022-01-14 16:00:42 +02:00
Artem Boldariev	a38b4945c1	DoH: add bad HTTP/2 requests logging Add some error logging when facing bad requests over HTTP/2. Log the address and the error description.	2022-01-14 16:00:42 +02:00
Ondřej Surý	0a4e91ee47	Revert "Always enqueue isc__nm_tcp_resumeread()" The commit itself is harmless, but at the same time it is also useless, so we are reverting it. This reverts commit `11c869a3d5`.	2022-01-13 19:06:39 +01:00
Ondřej Surý	7370725008	Fix the UDP recvmmsg support Previously, the netmgr/udp.c tried to detect the recvmmsg detection in libuv with #ifdef UV_UDP_<foo> preprocessor macros. However, because the UV_UDP_<foo> are not preprocessor macros, but enum members, the detection didn't work. Because the detection didn't work, the code didn't have access to the information when we received the final chunk of the recvmmsg and tried to free the uvbuf every time. Fortunately, the isc__nm_free_uvbuf() had a kludge that detected attempt to free in the middle of the receive buffer, so the code worked. However, libuv 1.37.0 changed the way the recvmmsg was enabled from implicit to explicit, and we checked for yet another enum member presence with preprocessor macro, so in fact libuv recvmmsg support was never enabled with libuv >= 1.37.0. This commit changes to the preprocessor macros to autoconf checks for declaration, so the detection now works again. On top of that, it's now possible to cleanup the alloc_cb and free_uvbuf functions because now, the information whether we can or cannot free the buffer is available to us.	2022-01-13 19:06:39 +01:00
Ondřej Surý	58bd26b6cf	Update the copyright information in all files in the repository This commit converts the license handling to adhere to the REUSE specification. It specifically: 1. Adds used licnses to LICENSES/ directory 2. Add "isc" template for adding the copyright boilerplate 3. Changes all source files to include copyright and SPDX license header, this includes all the C sources, documentation, zone files, configuration files. There are notes in the doc/dev/copyrights file on how to add correct headers to the new files. 4. Handle the rest that can't be modified via .reuse/dep5 file. The binary (or otherwise unmodifiable) files could have license places next to them in <foo>.license file, but this would lead to cluttered repository and most of the files handled in the .reuse/dep5 file are system test files.	2022-01-11 09:05:02 +01:00
Ondřej Surý	11c869a3d5	Always enqueue isc__nm_tcp_resumeread() The isc__nm_tcp_resumeread() was using maybe_enqueue function to enqueue netmgr event which could case the read callback to be executed immediately if there was enough data waiting in the TCP queue. If such thing would happen, the read callback would be called before the previous read callback was finished and the worker receive buffer would be still marked "in use" causing a assertion failure. This would affect only raw TCP channels, e.g. rndc and http statistics.	2022-01-06 10:34:04 -08:00
Ondřej Surý	6269fce0fe	Use isc_mem_get_aligned() for isc_queue and cleanup max_threads The isc_queue_new() was using dirty tricks to allocate the head and tail members of the struct aligned to the cacheline. We can now use isc_mem_get_aligned() to allocate the structure to the cacheline directly. Use ISC_OS_CACHELINE_SIZE (64) instead of arbitrary ALIGNMENT (128), one cacheline size is enough to prevent false sharing. Cleanup the unused max_threads variable - there was actually no limit on the maximum number of threads. This was changed a while ago.	2022-01-05 17:10:58 +01:00
Artem Boldariev	5b7d4341fe	Use the TLS context cache for server-side contexts Using the TLS context cache for server-side contexts could reduce the number of contexts to initialise in the configurations when e.g. the same 'tls' entry is used in multiple 'listen-on' statements for the same DNS transport, binding to multiple IP addresses. In such a case, only one TLS context will be created, instead of a context per IP address, which could reduce the initialisation time, as initialising even a non-ephemeral TLS context introduces some delay, which can be visually noticeable by log activity. Also, this change lays down a foundation for Mutual TLS (when the server validates a client certificate, additionally to a client validating the server), as the TLS context cache can be extended to store additional data required for validation (like intermediates CA chain). Additionally to the above, the change ensures that the contexts are not being changed after initialisation, as such a practice is frowned upon. Previously we would set the supported ALPN tags within isc_nm_listenhttp() and isc_nm_listentlsdns(). We do not do that for client-side contexts, so that appears to be an overlook. Now we set the supported ALPN tags right after server-side contexts creation, similarly how we do for client-side ones.	2021-12-29 10:25:14 +02:00
Michał Kępień	ea89ab80ae	Fix error codes passed to connection callbacks Commit `9ee60e7a17` erroneously introduced duplicate conditions to several existing conditional statements responsible for determining error codes passed to connection callbacks upon failure. Fix the affected expressions to ensure connection callbacks are invoked with: - the ISC_R_SHUTTINGDOWN error code when a global netmgr shutdown is in progress, - the ISC_R_CANCELED error code when a specific operation has been canceled. This does not fix any known bugs, it only adjusts the changes introduced by commit `9ee60e7a17` so that they match its original intent.	2021-12-28 15:09:50 +01:00
Ondřej Surý	57d0fabadd	Stop leaking mutex in nmworker and cond in nm socket On FreeBSD, the pthread primitives are not solely allocated on stack, but part of the object lives on the heap. Missing pthread_*_destroy causes the heap memory to grow and in case of fast lived object it's possible to run out-of-memory. Properly destroy the leaking mutex (worker->lock) and the leaking condition (sock->cond).	2021-12-08 17:58:53 +01:00
Ondřej Surý	20ac73eb22	Improve the logging on failed TCP accept Previously, when TCP accept failed, we have logged a message with ISC_LOG_ERROR level. One common case, how this could happen is that the client hits TCP client quota and is put on hold and when resumed, the client has already given up and closed the TCP connection. In such case, the named would log: TCP connection failed: socket is not connected This message was quite confusing because it actually doesn't say that it's related to the accepting the TCP connection and also it logs everything on the ISC_LOG_ERROR level. Change the log message to "Accepting TCP connection failed" and for specific error states lower the severity of the log message to ISC_LOG_INFO.	2021-12-02 13:50:00 +01:00
Artem Boldariev	f0e18f3927	Add isc_nm_has_encryption() This commit adds an isc_nm_has_encryption() function intended to check if a given handle is backed by a connection which uses encryption.	2021-11-30 12:20:22 +02:00
Artem Boldariev	07cf827b0b	Add isc_nm_socket_type() This commit adds an isc_nm_socket_type() function which can be used to obtain a handle's socket type. This change obsoletes isc_nm_is_tlsdns_handle() and isc_nm_is_http_handle(). However, it was decided to keep the latter as we eventually might end up supporting multiple HTTP versions.	2021-11-30 12:20:22 +02:00
Artem Boldariev	b211fff4cb	TLS stream: disable TLS I/O debug log message by default This commit makes the TLS stream code to not issue mostly useless debug log message on error during TLS I/O. This message was cluttering logs a lot, as it can be generated on (almost) any non-clean TLS connection termination, even in the cases when the actual query completed successfully. Nor does it provide much value for end-users, yet it can occasionally be seen when using dig and quite often when running BIND over a publicly available network interface.	2021-11-26 10:23:17 +02:00
Artem Boldariev	0b0c29dd51	DoH: Remove unneeded isc__nmsocket_prep_destroy() call This commit removes unneeded isc__nmsocket_prep_destroy() call on ALPN negotiation failure, which was eventually causing the TLS handle to leak. This call is not needed, as not attaching to the transport (TLS) handle should be enough. At this point it seems like a kludge from earlier days of the TLS code.	2021-11-26 10:23:17 +02:00
Artem Boldariev	6c8a97c78f	Fix a crash on unexpected incoming DNS message during XoT xfer This commit fixes a peculiar corner case in the client-side DoT code because of which a crash could occur during a zone transfer. A junk DNS message should be sent at the end of a zone transfer via TLS to trigger the crash (abort). This commit, hopefully, fixes that. Also, this commit adds similar changes to the TCP DNS code, as it shares the same origin and most of the logic.	2021-11-24 11:18:36 +02:00
Evan Hunt	7f63ee3bae	address '--disable-doh' failures Change 5756 (GL #2854) introduced build errors when using 'configure --disable-doh'. To fix this, isc_nm_is_http_handle() is now defined in all builds, not just builds that have DoH enabled. Missing code comments were added both for that function and for isc_nm_is_tlsdns_handle().	2021-11-17 13:48:43 -08:00
Artem Boldariev	80482f8d3e	DoH: Add isc_nm_set_min_answer_ttl() This commit adds an isc_nm_set_min_answer_ttl() function which is intended to to be used to give a hint to the underlying transport regarding the answer TTL. The interface is intentionally kept generic because over time more transports might benefit from this functionality, but currently it is intended for DoH to set "max-age" value within "Cache-Control" HTTP header (as recommended in the RFC8484, section 5.1 "Cache Interaction"). It is no-op for other DNS transports for the time being.	2021-11-05 14:14:59 +02:00
Evan Hunt	32b50407bf	check statichandle before attaching it is possible for udp_recv_cb() to fire after the socket is already shutting down and statichandle is NULL; we need to create a temporary handle in this case.	2021-10-18 14:21:04 -07:00
Evan Hunt	a55589f881	remove all references to isc_socket and related types Removed socket.c, socket.h, and all references to isc_socket_t, isc_socketmgr_t, isc_sockevent_t, etc.	2021-10-15 01:01:25 -07:00
Evan Hunt	075139f60e	netmgr: refactor isc__nm_incstats() and isc__nm_decstats() route/netlink sockets don't have stats counters associated with them, so it's now necessary to check whether socket stats exist before incrementing or decrementing them. rather than relying on the caller for this, we now just pass the socket and an index, and the correct stats counter will be updated if it exists.	2021-10-15 00:57:02 -07:00
Evan Hunt	8c51a32e5c	netmgr: add isc_nm_routeconnect() isc_nm_routeconnect() opens a route/netlink socket, then calls a connect callback, much like isc_nm_udpconnect(), with a handle that can then be monitored for network changes. Internally the socket is treated as a UDP socket, since route/netlink sockets follow the datagram contract.	2021-10-15 00:56:58 -07:00
Evan Hunt	8d6bf826c6	netmgr: refactor isc__nm_incstats() and isc__nm_decstats() After support for route/netlink sockets is merged, not all sockets will have stats counters associated with them, so it's now necessary to check whether socket stats exist before incrementing or decrementing them. rather than relying on the caller for this, we now just pass the socket and an index, and the correct stats counter will be updated if it exists.	2021-10-15 00:40:37 -07:00
Ondřej Surý	e603983ec9	Stop providing branch prediction information The __builtin_expect() can be used to provide the compiler with branch prediction information. The Gcc manual says[1] on the subject: In general, you should prefer to use actual profile feedback for this (-fprofile-arcs), as programmers are notoriously bad at predicting how their programs actually perform. Stop using __builtin_expect() and ISC_LIKELY() and ISC_UNLIKELY() macros to provide the branch prediction information as the performance testing shows that named performs better when the __builtin_expect() is not being used. 1. https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fexpect	2021-10-14 10:33:24 +02:00
Artem Boldariev	abecfdc298	DoT: do not attempt to call read callback if it is not avaialble This commit fixes a crash in DoT code when it was attempting to call a read callback on the later stages of the connection when it is not available. It also fixes [GL #2884] (back-trace provided in the bug report is exactly the same as was seen when fixing this problem).	2021-10-05 11:26:14 +03:00
Artem Boldariev	25b2c6ad96	Require "dot" ALPN token for zone transfer requests over DoT (XoT) This commit makes BIND verify that zone transfers are allowed to be done over the underlying connection. Currently, it makes sense only for DoT, but the code is deliberately made to be protocol-agnostic.	2021-10-05 11:23:47 +03:00
Artem Boldariev	eba3278e52	Add isc_nm_xfr_allowed() function The intention of having this function is to have a predicate to check if a zone transfer could be performed over the given handle. In most cases we can assume that we can do zone transfers over any stream transport except DoH, but this assumption will not work for zone transfers over DoT (XoT), as the RFC9103 requires ALPN to happen, which might not be the case for all deployments of DoT.	2021-10-05 11:23:47 +03:00
Artem Boldariev	56b3f5d832	Low level code to support ALPN in DoT This commit adds low-level code necessary to support ALPN in DoT as XoT requires "dot" ALPN token to be negotiated on a connection for zone transfers.	2021-10-05 11:23:47 +03:00
Evan Hunt	08ce69a0ea	Rewrite dns_resolver and dns_request to use netmgr timeouts - The `timeout_action` parameter to dns_dispatch_addresponse() been replaced with a netmgr callback that is called when a dispatch read times out. this callback may optionally reset the read timer and resume reading. - Added a function to convert isc_interval to milliseconds; this is used to translate fctx->interval into a value that can be passed to dns_dispatch_addresponse() as the timeout. - Note that netmgr timeouts are accurate to the millisecond, so code to check whether a timeout has been reached cannot rely on microsecond accuracy. - If serve-stale is configured, then a timeout received by the resolver may trigger it to return stale data, and then resume waiting for the read timeout. this is no longer based on a separate stale timer. - The code for canceling requests in request.c has been altered so that it can run asynchronously. - TCP timeout events apply to the dispatch, which may be shared by multiple queries. since in the event of a timeout we have no query ID to use to identify the resp we wanted, we now just send the timeout to the oldest query that was pending. - There was some additional refactoring in the resolver: combining fctx_join() and fctx_try_events() into one function to reduce code duplication, and using fixednames in fetchctx and fetchevent. - Incidental fix: new_adbaddrinfo() can't return NULL anymore, so the code can be simplified.	2021-10-02 11:39:56 -07:00
Ondřej Surý	9ee60e7a17	netmgr fixes needed for dispatch - The read timer must always be stopped when reading stops. - Read callbacks can now call isc_nm_read() again in TCP, TCPDNS and TLSDNS; previously this caused an assertion. - The wrong failure code could be sent after a UDP recv failure because the if statements were in the wrong order. the check for a NULL address needs to be after the check for an error code, otherwise the result will always be set to ISC_R_EOF. - When aborting a read or connect because the netmgr is shutting down, use ISC_R_SHUTTINGDOWN. (ISC_R_CANCELED is now reserved for when the read has been canceled by the caller.) - A new function isc_nmhandle_timer_running() has been added enabling a callback to check whether the timer has been reset after processing a timeout. - Incidental netmgr fix: always use isc__nm_closing() instead of referencing sock->mgr->closing directly - Corrected a few comments that used outdated function names.	2021-10-02 11:39:56 -07:00
Evan Hunt	d9e1ad9e37	Remove reference count REQUIRE in isc_nm_read() Previously isc_nm_read() required references on the handle to be at least 2, under the assumption that it would only ever be called from a connect or accept callback. however, it can also be called from a read callback, in which case the reference count might be only 1.	2021-10-02 11:39:56 -07:00
Mark Andrews	8fc9bb8e8e	Address use before NULL check warning of ievent->sock Reorder REQUIRE checks to ensure ievent->sock is checked earlier	2021-09-28 11:57:47 +10:00
Mark Andrews	7079829b84	Address use before NULL check warning of uvreq move dereference of uvreq until the after NULL check.	2021-09-28 11:57:47 +10:00
Ondřej Surý	8248da3b83	Preserve the contents of socket buffer on realloc On TCPDNS/TLSDNS read callback, the socket buffer could be reallocated if the received contents would be larger than the buffer. The existing code would not preserve the contents of the existing buffer which lead to the loss of the already received data. This commit changes the isc_mem_put()+isc_mem_get() with isc_mem_reget() to preserve the existing contents of the socket buffer.	2021-09-23 22:36:01 +02:00
Ondřej Surý	8edbd0929f	Use isc_mem_reget() to handle the internal active handle cache The netmgr, has an internal cache for freed active handles. This cache was allocated using isc_mem_allocate()/isc_mem_free() API because it was simpler to reallocate the cache when we needed to grow it. The new isc_mem_reget() function could be used here reducing the need to use isc_mem_allocate() API which is tad bit slower than isc_mem_get() API.	2021-09-23 22:17:15 +02:00
Artem Boldariev	530133c10f	Unify DoH URI making throughout the codebase This commit adds new function isc_nm_http_makeuri() which is supposed to unify DoH URI construction throughout the codebase. It handles IPv6 addresses, hostnames, and IPv6 addresses given as hostnames properly, and replaces similar ad-hoc code in the codebase.	2021-08-30 10:21:58 +03:00
Evan Hunt	fc6f751fbe	replace per-protocol keepalive functions with a common one this commit removes isc__nm_tcpdns_keepalive() and isc__nm_tlsdns_keepalive(); keepalive for these protocols and for TCP will now be set directly from isc_nmhandle_keepalive(). protocols that have an underlying TCP socket (i.e., TLS stream and HTTP), now have protocol-specific routines, called by isc_nmhandle_keeaplive(), to set the keepalive value on the underlying socket.	2021-08-27 10:02:10 -07:00
Evan Hunt	7867b8b57d	enable keepalive when the keepalive EDNS option is seen previously, receiving a keepalive option had no effect on how long named would keep the connection open; there was a place to configure the keepalive timeout but it was never used. this commit corrects that. this also fixes an error in isc__nm_{tcp,tls}dns_keepalive() in which the sense of a REQUIRE test was reversed; previously this error had not been noticed because the functions were not being used.	2021-08-27 09:56:51 -07:00
Evan Hunt	19e24e22f5	cleanup netmgr-int.h - fix some duplicated and out-of-order prototypes declared in netmgr-int.h - rename isc_nm_tcpdns_keepalive to isc__nm_tcpdns_keepalive as it's for internal use	2021-08-27 09:56:51 -07:00
Artem Boldariev	8a655320c8	Fix a crash (in dig) when closing HTTP socket with unused session This commit fixes a crash (caused by an assert) when closing an HTTP/2 socket with unused HTTP/2 session.	2021-08-27 12:14:48 +03:00
Artem Boldariev	32cd4367a3	Make no assumptions regarding HTTP headers processing order This commit changes the DoH code in such a way that it makes no assumptions regarding which headers are expected to be processed first. In particular, the code expected the :method: pseudo-header to be processed early, which might not be true.	2021-08-25 10:32:56 +03:00
Ondřej Surý	87d5c8ab7c	Disable the Path MTU Discover on UDP Sockets Instead of disabling the fragmentation on the UDP sockets, we now disable the Path MTU Discovery by setting IP(V6)_MTU_DISCOVER socket option to IP_PMTUDISC_OMIT on Linux and disabling IP(V6)_DONTFRAG socket option on FreeBSD. This option sets DF=0 in the IP header and also ignores the Path MTU Discovery. As additional mitigation on Linux, we recommend setting net.ipv4.ip_no_pmtu_disc to Mode 3: Mode 3 is a hardend pmtu discover mode. The kernel will only accept fragmentation-needed errors if the underlying protocol can verify them besides a plain socket lookup. Current protocols for which pmtu events will be honored are TCP, SCTP and DCCP as they verify e.g. the sequence number or the association. This mode should not be enabled globally but is only intended to secure e.g. name servers in namespaces where TCP path mtu must still work but path MTU information of other protocols should be discarded. If enabled globally this mode could break other protocols.	2021-08-19 07:12:33 +02:00
Artem Boldariev	e639957b58	Optimise TLS stream for small write size (>= 512 bytes) This commit changes TLS stream behaviour in such a way, that it is now optimised for small writes. In the case there is a need to write less or equal to 512 bytes, we could avoid calling the memory allocator at the expense of possibly slight increase in memory usage. In case of larger writes, the behviour remains unchanged.	2021-08-12 14:28:17 +03:00
Artem Boldariev	e301e1e3b8	Avoid memory copying during send in TLS stream At least at this point doing memory copying is not required. Probably it was a workaround for some problem in the earlier days of DoH, at this point it appears to be a waste of CPU cycles.	2021-08-12 14:28:17 +03:00
Artem Boldariev	bd69c7c57c	Simplify buffering code logic in http_send_outgoing() This commit significantly simplifies the code in http_send_outgoing() as it was unnecessary complicated, because it was dealing with multiple statically and dynamically allocated buffers, making it extremely hard to follow, as well as making it to do unnecessary memory copying in some situations. This commit fixes these issues, while retaining the high level buffering logic.	2021-08-12 14:28:17 +03:00
Artem Boldariev	a32faa20b4	DoH: replace a custom buffer code for POST data with isc_buffer_t This commit replaces the custom buffer code in client-side DoH code intended to keep track of POST data, with isc_buffer_t.	2021-08-12 14:28:17 +03:00
Artem Boldariev	5b52a7e37e	When terminating a client session, mark it as closing When an HTTP/2 client terminates a session it means that it is about to close the underlying connection. However, we were not doing that. As a result, with the latest changes to the test suite, which made it to limit amount of requests per a transport connection, the tests using quota would hang for quite a while. This commit fixes that.	2021-08-12 14:28:17 +03:00
Artem Boldariev	a05728beb0	Do not call http_do_bio() in isc__nm_http_request() The function should not be called here because it is, in general, supposed to be called at the end of the transport level callbacks to perform I/O, and thus, calling it here is clearly a mistake because it breaks other code expectations. As a result of the call to http_do_bio() from within isc__nm_http_request() the unit tests were running slower than expected in some situations. In this particular situation http_do_bio() is going to be called at the end of the transport_connect_cb() (initially), or http_readcb(), sending all of the scheduled requests at once. This change affects only the test suite because it is the only place in the codebase where isc__nm_http_request() is used in order to ensure that the server is able to handle multiple HTTP/2 streams at once.	2021-08-12 14:28:16 +03:00
Artem Boldariev	849d38b57b	Fix a crash by attach to the transport socket as early as possible This commit fixes a crash in DoH caused by transport handle to be detached too early when sending outgoing data. We need to attach to the session->handle earlier because as an indirect result of the nghttp2_session_mem_send() the session might get closed and the handle detached. However, there is still might be some outgoing data to handle. Besides, even when the underlying socket was closed via the handle, we still should try to attempt to send outgoing data via isc_nm_send() to let it call write callback, passed to the http_send_outgoing().	2021-08-12 14:28:16 +03:00
Artem Boldariev	e0704f2e5d	Use isc_buffer_t to keep track of outgoing response This commit gets rid of custom code taking care of response buffering by replacing the custom code with isc_buffer_t. Also, it gets rid of an unnecessary memory copying when sending a response.	2021-08-12 14:28:16 +03:00
Artem Boldariev	6fe4ab39b9	Use isc_buffer_t to keep track of incoming POST data This commit replaces the ad-hoc 64K buffer for incoming POST data with isc_buffer_t backed by dynamically allocated buffer sized accordingly to the value in the "Content-Length" header.	2021-08-12 14:28:16 +03:00
Artem Boldariev	0ca790d9bf	DoH: isc__buffer_usedregion->isc_buffer_usedregion in client_send() This commit replaces wrong usage of isc__buffer_usedregion() instead of implied isc_buffer_usedregion().	2021-08-12 14:28:16 +03:00
Artem Boldariev	2733cca3ac	Replace ad-hoc DNS message buffer in client code with isc_buffer_t The commit replaces an ad-hoc incoming DNS-message buffer in the client-side DoH code with isc_buffer_t. The commit also fixes a timing issue in the unit tests revealed by the change.	2021-08-12 14:28:16 +03:00
Artem Boldariev	c819caa3a1	Replace the HTTP/2 session's ad-hoc buffer with isc_buffer_t This commit replaces a static ad-hoc HTTP/2 session's temporary buffer with a realloc-able isc_buffer_t object, which is being allocated on as needed basis, lowering the memory consumption somewhat. The buffer is needed in very rare cases, so allocating it prematurely is not wise. Also, it fixes a bug in http_readcb() where the ad-hoc buffer appeared to be improperly used, leading to a situation when the processed data from the receiving regions can be processed twice, while unprocessed data will never be processed.	2021-08-12 14:28:16 +03:00
Artem Boldariev	170cc41d5c	Get rid of some HTTP/2 related types when NGHTTP2 is not available This commit removes definitions of some DoH-related types when libnghttp2 is not available.	2021-08-04 10:32:27 +03:00
Artem Boldariev	f388b71378	Get rid of RW locks in the DoH code This commit gets rid of RW locks in a hot path of the DoH code. In the original design, it was implied that we add new endpoints after the HTTP listener was created. Such a design implies some locking. We do not need such flexibility, though. Instead, we could build a set of endpoints before the HTTP listener gets created. Such a design does not need RW locks at all.	2021-08-04 10:32:25 +03:00
Artem Boldariev	590e8e0b86	Make max number of HTTP/2 streams configurable This commit makes number of concurrent HTTP/2 streams per connection configurable as a mean to fight DDoS attacks. As soon as the limit is reached, BIND terminates the whole session. The commit adds a global configuration option (http-streams-per-connection) which can be overridden in an http <name> {...} statement like follows: http local-http-server { ... streams-per-connection 100; ... }; For now the default value is 100, which should be enough (e.g. NGINX uses 128, but it is a full-featured WEB-server). When using lower numbers (e.g. ~70), it is possible to hit the limit with e.g. flamethrower.	2021-07-16 11:50:22 +03:00
Artem Boldariev	954240467d	Verify HTTP paths both in incoming requests and in config file This commit adds the code (and some tests) which allows verifying validity of HTTP paths both in incoming HTTP requests and in BIND's configuration file.	2021-07-16 10:28:08 +03:00
Artem Boldariev	64cd7e8a7f	Fix crash in DoH on empty query string in GET requests An unhandled code path left GET query string data uninitialised (equal to NULL) and led to a crash during the requests' base64 data decoding. This commit fixes that.	2021-07-13 16:53:51 +03:00
Ondřej Surý	a9e6a7ae57	Disable setting the thread affinity It was discovered that setting the thread affinity on both the netmgr and netthread threads lead to inconsistent recursive performance because sometimes the netmgr and netthread threads would compete over single resource and sometimes not. Removing setting the affinity causes a slight dip in the authoritative performance around 5% (the measured range was from 3.8% to 7.8%), but the recursive performance is now consistently good.	2021-07-13 14:48:29 +02:00
Ondřej Surý	29a285a67d	Revert the allocate/free -> get/put change from jemalloc change In the jemalloc merge request, we missed the fact that ah_frees and ah_handles are reallocated which is not compatible with using isc_mem_get() for allocation and isc_mem_put() for deallocation. This commit reverts that part and restores use of isc_mem_allocate() and isc_mem_free().	2021-07-09 18:19:57 +02:00
Ondřej Surý	f487c6948b	Replace locked mempools with memory contexts Current mempools are kind of hybrid structures - they serve two purposes: 1. mempool with a lock is basically static sized allocator with pre-allocated free items 2. mempool without a lock is a doubly-linked list of preallocated items The first kind of usage could be easily replaced with jemalloc small sized arena objects and thread-local caches. The second usage not-so-much and we need to keep this (in libdns:message.c) for performance reasons.	2021-07-09 15:58:02 +02:00
Ondřej Surý	5ab05d1696	Replace isc_mem_allocate() usage with isc_mem_get() in netmgr.c The isc_mem_allocate() comes with additional cost because of the memory tracking. In this commit, we replace the usage with isc_mem_get() because we track the allocated sizes anyway, so it's possible to also replace isc_mem_free() with isc_mem_put().	2021-07-09 15:58:02 +02:00
Artem Boldariev	c6d0e3d3a7	Return HTTP status code for small/malformed requests This commit makes BIND return HTTP status codes for malformed or too small requests. DNS request processing code would ignore such requests. Such an approach works well for other DNS transport but does not make much sense for HTTP, not allowing it to complete the request/response sequence. Suppose execution has reached the point where DNS message handling code has been called. In that case, it means that the HTTP request has been successfully processed, and, thus, we are expected to respond to it either with a message containing some DNS payload or at least to return an error status code. This commit ensures that BIND behaves this way.	2021-07-09 16:37:08 +03:00
Artem Boldariev	fedff2cd6c	Return "Bad Request" (400) in a case of Base64 decoding error This error code fits better than the more generic "Internal Server Error" (500) which implies that the problem is on the server. Also, do not end the whole HTTP/2 session on a bad request.	2021-07-09 16:26:46 +03:00
Artem Boldariev	1792740075	Ignore an "Accept" HTTP header value We were too strict regarding the value and presence of "Accept" HTTP header, slightly breaking compatibility with the specification. According to RFC8484 client SHOULD add "Accept" header to the requests but MUST be able to handle "application/dns-message" media type regardless of the value of the header. That basically suggests we ignore its value. Besides, verifying the value of the "Accept" header is a bit tricky because it could contain multiple media types, thus requiring proper parsing. That is doable but does not provide us with any benefits. Among other things, not verifying the value also fixes compatibility with clients, which could advertise multiple media types as supported, which we should accept. For example, it is possible for a perfectly valid request to contain "application/dns-message", "application/", and "/*" in the "Accept" header value. Still, we would treat such a request as invalid.	2021-07-09 16:26:46 +03:00
Artem Boldariev	7b6945fb60	Fix BIND hanging when browsers end HTTP/2 streams prematurely The commit fixes BIND hanging when browsers end HTTP/2 streams prematurely (for example, by sending RST_STREAM). It ensures that isc__nmsocket_prep_destroy() will be called for an HTTP/2 stream, allowing it to be properly disposed. The problem was impossible to reproduce using dig or DoH benchmarking software (e.g. flamethrower) because these do not tend to end HTTP/2 streams prematurely.	2021-07-09 15:42:44 +03:00
Artem Boldariev	094fcc10e7	Move the code which calls server read callback into a separate func This commit moves the code which calls server read callback into a separate function to avoid code repetition.	2021-07-09 15:42:44 +03:00
Ondřej Surý	2bb454182b	Make the DNS over HTTPS support optional This commit adds two new autoconf options `--enable-doh` (enabled by default) and `--with-libnghttp2` (mandatory when DoH is enabled). When DoH support is disabled the library is not linked-in and support for http(s) protocol is disabled in the netmgr, named and dig.	2021-07-07 09:50:53 +02:00
Ondřej Surý	b941411072	Disable IP fragmentation on the UDP sockets In DNS Flag Day 2020, we started setting the DF (Don't Fragment socket option on the UDP sockets. It turned out, that this code was incomplete leading to dropping the outgoing UDP packets. This has been now remedied, so it is possible to disable the fragmentation on the UDP sockets again as the sending error is now handled by sending back an empty response with TC (truncated) bit set. This reverts commit `66eefac78c`.	2021-06-23 17:41:34 +02:00
Evan Hunt	a3ba95116e	Handle UDP send errors when sending DNS message larger than MTU When the fragmentation is disabled on UDP sockets, the uv_udp_send() call can fail with UV_EMSGSIZE for messages larger than path MTU. Previously, this error would end with just discarding the response. In this commit, a proper handling of such case is added and on such error, a new DNS response with truncated bit set is generated and sent to the client. This change allows us to disable the fragmentation on the UDP sockets again.	2021-06-23 17:41:34 +02:00
Ondřej Surý	ec86759401	Replace netmgr per-protocol sequential function with a common one Previously, each protocol (TCPDNS, TLSDNS) has specified own function to disable pipelining on the connection. An oversight would lead to assertion failure when opcode is not query over non-TCPDNS protocol because the isc_nm_tcpdns_sequential() function would be called over non-TCPDNS socket. This commit removes the per-protocol functions and refactors the code to have and use common isc_nm_sequential() function that would either disable the pipelining on the socket or would handle the request in per specific manner. Currently it ignores the call for HTTP sockets and causes assertion failure for protocols where it doesn't make sense to call the function at all.	2021-06-22 17:21:44 +03:00
Artem Boldariev	dc356bb196	Fix ASAN error in DoH (passing NULL to memmove()) The warning was produced by an ASAN build: runtime error: null pointer passed as argument 2, which is declared to never be null This commit fixes it by checking if nghttp2_session_mem_send() has actually returned anything.	2021-06-16 17:46:10 +03:00
Artem Boldariev	ccd2267b1c	Set sock->iface and sock->peer properly for layered connection types This change sets the mentioned fields properly and gets rid of klusges added in the times when we were keeping pointers to isc_sockaddr_t instead of copies. Among other things it helps to avoid a situation when garbage instead of an address appears in dig output.	2021-06-14 11:37:36 +03:00
Artem Boldariev	b84fa122ce	Make BIND refuse to serve XFRs over DoH We cannot use DoH for zone transfers. According to RFC8484 a DoH request contains exactly one DNS message (see Section 6: Definition of the "application/dns-message" Media Type, https://datatracker.ietf.org/doc/html/rfc8484#section-6). This makes DoH unsuitable for zone transfers as often (and usually!) these need more than one DNS message, especially for larger zones. As zone transfers over DoH are not (yet) standardised, nor discussed in RFC8484, the best thing we can do is to return "not implemented." Technically DoH can be used to transfer small zones which fit in one message, but that is not enough for the generic case. Also, this commit makes the server-side DoH code ensure that no multiple responses could be attempted to be sent over one HTTP/2 stream. In HTTP/2 one stream is mapped to one request/response transaction. Now the write callback will be called with failure error code in such a case.	2021-06-14 11:37:36 +03:00
Artem Boldariev	009752cab0	Pass an HTTP handle to the read callback when finishing a stream This commit fixes a leftover from an earlier version of the client-side DoH code when the underlying transport handle was used directly.	2021-06-14 11:37:36 +03:00
Artem Boldariev	d5d20cebb2	Fix a crash in the client-side DoH code (header processing callback) Support a situation in header processing callback when client side code could receive a belated response or part of it. That could happen when the HTTP/2 session was already closed, but there were some response data from server in flight. Other client-side nghttp2 callbacks code already handled this case. The bug became apparent after HTTP/2 write buffering was supported, leading to rare unit test failures.	2021-06-14 11:37:33 +03:00
Artem Boldariev	2dfc0d9afc	Nullify connect.cstream in time and keep track of all client streams This commit ensures that sock->h2.connect.cstream gets nullified when the object in question is deleted. This fixes a nasty crash in dig exposed when receiving large responses leading to double free()ing. Also, it refactors how the client-side code keeps track of client streams (hopefully) preventing from similar errors appearing in the future.	2021-06-14 11:37:29 +03:00
Artem Boldariev	5b507c1136	Fix BIND to serve large HTTP responses This commit makes NM code to report HTTP as a stream protocol. This makes it possible to handle large responses properly. Like: dig +https @127.0.0.1 A cmts1-dhcp.longlines.com	2021-06-14 11:37:17 +03:00
Ondřej Surý	440fb3d225	Completely remove BIND 9 Windows support The Windows support has been completely removed from the source tree and BIND 9 now no longer supports native compilation on Windows. We might consider reviewing mingw-w64 port if contributed by external party, but no development efforts will be put into making BIND 9 compile and run on Windows again.	2021-06-09 14:35:14 +02:00
Ondřej Surý	f14d870d15	Fix copy&paste error in setsockopt_off Because of copy&paste error the setsockopt_off macro would enable the socket option instead of disabling it.	2021-06-02 17:47:14 +02:00
Ondřej Surý	67afea6cfc	Cleanup the remaining of HAVE_UV_<func> macros While cleaning up the usage of HAVE_UV_<func> macros, we forgot to cleanup the HAVE_UV_UDP_CONNECT in the actual code and HAVE_UV_TRANSLATE_SYS_ERROR and this was causing Windows build to fail on uv_udp_send() because the socket was already connected and we were falsely assuming that it was not. The platforms with autoconf support were not affected, because we were still checking for the functions from the configure.	2021-06-02 11:23:36 +02:00
Artem Boldariev	35d0027f36	HTTP/2 write buffering This commit adds the ability to consolidate HTTP/2 write requests if there is already one in flight. If it is the case, the code will consolidate multiple subsequent write request into a larger one allowing to utilise the network in a more efficient way by creating larger TCP packets as well as by reducing TLS records overhead (by creating large TLS records instead of multiple small ones). This optimisation is especially efficient for clients, creating many concurrent HTTP/2 streams over a transport connection at once. This way, the code might create a small amount of multi-kilobyte requests instead of many 50-120 byte ones. In fact, it turned out to work so well that I had to add a work-around to the code to ensure compatibility with the flamethrower, which, at the time of writing, does not support TLS records larger than two kilobytes. Now the code tries to flush the write buffer after 1.5 kilobyte, which is still pretty adequate for our use case. Essentially, this commit implements a recommendation given by nghttp2 library: https://nghttp2.org/documentation/nghttp2_session_mem_send.html	2021-06-01 21:07:45 +03:00
Ondřej Surý	87fe97ed91	Add asynchronous work API to the network manager The libuv has a support for running long running tasks in the dedicated threadpools, so it doesn't affect networking IO. This commit adds isc_nm_work_enqueue() wrapper that would wraps around the libuv API and runs it on top of associated worker loop. The only limitation is that the function must be called from inside network manager thread, so the call to the function should be wrapped inside a (bound) task.	2021-05-31 14:52:05 +02:00
Ondřej Surý	211bfefbaa	Use UV_VERSION_HEX to decide whether we need libuv shim functions Instead of having a configure check for every missing function that has been added in later version of libuv, we now use UV_VERSION_HEX to decide whether we need the shim or not.	2021-05-31 14:52:05 +02:00
Ondřej Surý	7477d1b2ed	Add uv_os_getenv() and uv_os_setenv() compatibility shims The uv_os_getenv() and uv_os_setenv() functions were introduced in the libuv >= 1.12.0. Add simple compatibility shims for older versions.	2021-05-31 14:52:05 +02:00

... 2 3 4 5 6 ...

597 commits