Libmaxmind does not provide any version macro for link time version.
Print at least runtime version library used, if linked.
(cherry picked from commit e6d7384c0d)
The util/prepare-softhsm2.sh script is useful for initializing a working
SoftHSM environment which can be used by unit tests and system tests.
However, since it is a test-specific script, it does not really belong
in the util/ subdirectory which is mostly pruned during the BIND source
tarball creation process. Move the prepare-softhsm2.sh script to
bin/tests/ so that its location is more appropriate for its purpose and
also so that it does not get removed during the BIND source tarball
creation process, allowing it to be used for setting up test
environments for tarball-based builds.
(cherry picked from commit c0be772ebc)
The LC_ALL=C assignments in the "idna" system test, which were only
meant to affect a certain subset of checks, in fact persist throughout
all the subsequent checks in that system test. That affects the test's
behavior and is misleading.
When the "VARIABLE=value command ..." syntax is used in a shell script,
in order for the variable assignment to only apply to "command", the
latter must be an external binary; otherwise, the VARIABLE=value
assignment persists for all subsequent commands in a script:
$ cat foo.sh
#!/bin/sh
foo() {
/bin/sh bar.sh
}
BAR="baz0"
BAR="baz1" /bin/sh bar.sh
echo "foo: BAR=${BAR}"
BAR="baz2" foo
echo "foo: BAR=${BAR}"
$ cat bar.sh
#!/bin/sh
echo "bar: BAR=${BAR}"
$ /bin/sh foo.sh
bar: BAR=baz1
foo: BAR=baz0
bar: BAR=baz2
foo: BAR=baz2
$
Fix by saving the value of LC_ALL before the relevant set of checks in
the "idna" system test, restoring it afterwards, and dropping the
"LC_ALL=C command ..." syntax.
(cherry picked from commit 2ee7ff23ce)
The autosign test has a test case where a DNSSEC maintaiend zone
has a set of DNSSEC keys without any timing metadata set. It
tests if named picks up the key for publication and signing if a
delayed dnssec-settime/loadkeys event has occured.
The test failed intermittently despite the fact it sleeps for 5
seconds but the triggered key reconfigure action should happen after
3 seconds.
However, the test output showed that the test query came in before
the key reconfigure action was complete (see excerpts below).
The loadkeys command is received:
15:38:36 received control channel command 'loadkeys delay.example.'
The reconfiguring zone keys action is triggered after 3 seconds:
15:38:39 zone delay.example/IN: reconfiguring zone keys
15:38:39 DNSKEY delay.example/NSEC3RSASHA1/7484 (ZSK) is now published
15:38:39 DNSKEY delay.example/NSEC3RSASHA1/7455 (KSK) is now published
15:38:39 writing to journal
Two seconds later the test query comes in:
15:38:41 client @0x7f1b8c0562b0 10.53.0.1#44177: query
15:38:41 client @0x7f1b8c0562b0 10.53.0.1#44177: endrequest
And 6 more seconds later the reconfigure keys action is complete:
15:38:47 zone delay.example/IN: next key event: 05-Dec-2019 15:48:39
This commit fixes the test by checking the "next key event" log has
been seen before executing the test query, making sure that the
reconfigure keys action has been complete.
This commit however does not fix, nor explain why it took such a long
time (8 seconds) to reconfigure the keys.
(cherry picked from commit 2e4273b55a)
The first step in all existing setup.sh scripts is to call clean.sh. To
reduce code duplication and ensure all system tests added in the future
behave consistently with existing ones, invoke clean.sh from run.sh
before calling setup.sh.
(cherry picked from commit d8905b7a9c)
Since the role of the bin/tests/system/clean.sh script has now been
reduced to calling a given system test's clean.sh script, remove the
former altogether and replace its only use with a direct invocation of
the latter.
(cherry picked from commit bf3eeac067)
Since files containing system test output are no longer stored in test
subdirectories, bin/tests/system/clean.sh no longer needs to take care
of removing the test.output file for a given test as testsummary.sh
already takes care of that and even if a test suite terminates
abnormally and another one is started, tee invoked without the -a
command line switch overwrites the destination file if it exists, so
leftover test.output.* files from previous test suite runs are not a
concern. Remove the -r command line switch and the code associated with
it from the relevant scripts.
(cherry picked from commit b4d37878f6)
Some clean.sh scripts contain overly broad file deletion wildcards which
cause the test.output file (used by the system test framework for
collecting output) in a given system test's directory to be erroneously
removed immediately after the test is started (due to setup.sh scripts
calling clean.sh at the beginning). This prevents the test's output
from being placed in bin/tests/system/systests.output at the end of a
test suite run and thus can lead to test failures being ignored. Fix by
storing each test's output in a test.output.<test-name> file in
bin/tests/system/, which prevents clean.sh scripts from removing it (as
they should only ever affect files contained in a given system test's
directory).
(cherry picked from commit b0916bba41)
At the end of each system test suite run, the system test framework
collects all existing test.output files from system test subdirectories
and produces bin/tests/system/systests.output from those files.
However, it does not check whether a test.output file was found for
every executed test. Thus, if the test.output file is accidentally
deleted by the system test itself (e.g. due to an overly broad file
removal wildcard present in clean.sh), its output will not be included
in bin/tests/system/systests.output. Since the result of each system
test suite run is determined by bin/tests/system/testsummary.sh, which
only operates on the contents of bin/tests/system/systests.output, this
can lead to test failures being ignored. Fix by ensuring the number of
test results found in bin/tests/system/systests.output is equal to the
number of tests run and triggering a system test suite failure in case
of a discrepancy between these two values.
(cherry picked from commit 3c3085be3c)
xmlInitThreads() and xmlCleanupThreads() are called from within
named_statschannels_configure() and named_statschannels_shutdown(),
respectively. Both of these functions are executed by worker threads,
not the main named thread. This causes ASAN to report memory leaks like
the following one upon shutdown (as long as named is asked to produce
any XML output over its configured statistics channels during its
lifetime):
Direct leak of 968 byte(s) in 1 object(s) allocated from:
#0 0x7f677c249cd8 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cc:153
#1 0x7f677bc1838f in xmlGetGlobalState (/usr/lib/libxml2.so.2+0xa838f)
The data mentioned in the above report is a libxml2 state structure
stored as thread-specific data. Such chunks of memory are automatically
released (by a destructor passed to pthread_key_create() by libxml2)
whenever a thread that allocated a given chunk exits. However, if
xmlCleanupThreads() is called by a given thread before it exits, the
destructor will not be invoked (due to xmlCleanupThreads() calling
pthread_key_delete()) and ASAN will report a memory leak. Thus,
xmlInitThreads() and xmlCleanupThreads() must not be called from worker
threads. Since xmlInitThreads() must be called on Windows in order for
libxml2 to work at all, move xmlInitThreads() and xmlCleanupThreads()
calls to the main named thread (which does not produce any XML output
itself) in order to prevent the memory leak from being reported by ASAN.
(cherry picked from commit b425b5d56e)
Loaded GeoIP2 databases are only released when named is shut down, but
not during server reconfiguration. This causes memory to be leaked
every time "rndc reconfig" or "rndc reload" is used, as long as any
GeoIP2 database is in use. Fix by releasing any loaded GeoIP2 databases
before reloading them. Do not call dns_geoip_shutdown() until server
shutdown as that function releases the memory context used for caching
GeoIP2 lookup results.
(cherry picked from commit 670afbe84a)
When loading the configuration fails, there might be already other tasks
running and calling OpenSSL library functions. The OpenSSL on_exit
handler is called when exiting the main process and there's a timing
race between the on_exit function that destroys OpenSSL allocated
resources (threads, locks, ...) and other tasks accessing the very same
resources leading to a crash in the system threading library. Therefore,
the fatal() function needs to request exlusive access to the task
manager to finish the already running tasks and exit only when no other
tasks are running.
(cherry picked from commit 952d7fde63)
Ensure any unexpected failure in the "tcp" system test causes it to be
immediately interrupted with an error to make the aforementioned test
more reliable. Since the exit code for "expr 0 + 0" is 1, the status
variable needs to be updated using arithmetic expansion.
(cherry picked from commit 9841635b7f)
Ensure all checks in the "tcp" system test are numbered, so that
forensic data is preserved in case of any failure.
(cherry picked from commit 2f4877d11c)
assert_int_equal() calls in bin/tests/system/tcp/tests.sh pass the found
value as the first argument and the expected value as the second
argument, while the function interprets its arguments the other way
round. Fix argument handling in assert_int_equal() to make sure the
error messages printed by that function are correct.
(cherry picked from commit 6bd1f68bef)
In the TCP high-water checks, "rndc stats" is run after ans6 reports
that it opened the requested number of TCP connections. However, we
fail to account for the fact that ns5 might not yet have called accept()
for these connections, in which case the counts output by "rndc stats"
will be off. To prevent intermittent "tcp" system test failures, allow
the relevant connection count checks to be retried (just once, after one
second, as that should be enough for any system to accept() a dozen TCP
connections under any circumstances).
(cherry picked from commit 1e22e052d0)
Add a shell function which is used in the "tcp" system test, but has
been accidentally omitted from !2425. Make sure the function does not
change the value of "ret" itself, so that the caller can decide what to
do with the function's return value.
(cherry picked from commit 8bb7f1f2a1)
Test jitter distribution in NSEC3 dynamic zone and for a zone that has old
signatures. In both cases the generated signatures should be spread nicely.
(cherry picked from commit 540b90fd6c)
This variable will report the maximum number of simultaneous tcp clients
that BIND has served while running.
It can be verified by running rndc status, then inspect "tcp high-water:
count", or by generating statistics file, rndc stats, then inspect the
line with "TCP connection high-water" text.
The tcp-highwater variable is atomically updated based on an existing
tcp-quota system handled in ns/client.c.
(cherry picked from commit 66fe8627de)
The named_g_defaultdnstap was never used as the dnstap requires
explicit configuration of the output file.
Related scan-build report:
./server.c:3476:14: warning: Value stored to 'dpath' during its initialization is never read
const char *dpath = named_g_defaultdnstap;
^~~~~ ~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
(cherry picked from commit 6decd14592)
Portion of the digdelv test are skipped on IPv6 due to extra quotes
around $TESTSOCK6: "I:digdelv:IPv6 unavailable; skipping".
Researched by @michal.
Regressed with 351efd8812.
(cherry picked from commit 1b6419f8a7)
EDNS mechanisms only apply to DNS over UDP. Thus, errors encountered
while sending DNS queries over TCP must not influence EDNS timeout
statistics.
(cherry picked from commit fce3c93ea2)
If a TCP connection fails while attempting to send a query to a server,
the fetch context will be restarted without marking the target server as
a bad one. If this happens for a server which:
- was already marked with the DNS_FETCHOPT_EDNS512 flag,
- responds to EDNS queries with the UDP payload size set to 512 bytes,
- does not send response packets larger than 512 bytes,
and the response for the query being sent is larger than 512 byes, then
named will pointlessly alternate between sending UDP queries with EDNS
UDP payload size set to 512 bytes (which are responded to with truncated
answers) and TCP connections until the fetch context retry limit is
reached. Prevent such query loops by marking the server as bad for a
given fetch context if the advertised EDNS UDP payload size for that
server gets reduced to 512 bytes and it is impossible to reach it using
TCP.
(cherry picked from commit 6cd115994e)
I was truncating zone files for experimental purposes when I found
that `named-compilezone | head` got stuck. The full command line that
exhibited the problem was:
dig axfr dotat.at |
named-compilezone -o /dev/stdout dotat.at /dev/stdin |
head
This requires a large enough zone to exhibit the problem, more than
about 70000 bytes of plain text output from named-compilezone.
I was running the command on Debian Stretch amd64.
This was puzzling since it looked like something was suppressing the
SIGPIPE. I used `strace` to examine what was happening at the hang.
The program was just calling write() a lot to print the zone file, and
the last write() hanged until I sent it a SIGINT.
During some discussion with friends, Ian Jackson guessed that opening
/dev/stdout O_RDRW might be the problem, and after some tests we found
that this does in fact suppress SIGPIPE.
Since `named-compilezone` only needs to write to its output file, the
fix is to omit the stdio "+" update flag.
(cherry picked from commit a87ccea032)
It was found that NSEC Aggressive Caching has a significant performance impact
on BIND 9 when used as recursor. This commit disables the synth-from-dnssec
configuration option by default to provide immediate remedy for people running
BIND 9.12+. The NSEC Aggressive Cache will be enabled again after a proper fix
will be prepared.
(cherry picked from commit a20c42dca6)