Ensure BIND can be tested on Windows in GitLab to more quickly catch
build and test errors on that operating system.
Some notes:
- While build jobs are triggered for all pipelines, system test jobs
are not - due to the time it takes to run the complete system test
suite on Windows (about 20 minutes), the latter are only run for
pipelines created through GitLab's web interface and for pipelines
created for Git tags.
- Only the "Release" build configuration is currently used. Adding
"Debug" builds is a matter of extending .gitlab-ci.yml, but it was
not done for the time being due to questionable usefulness of
performing such builds in GitLab CI.
- Only a 64-bit build is performed. Adding support for 32-bit builds
is not planned to be implemented.
- Unit tests are still not run on Windows, but adding support for that
is on the roadmap.
- All Windows GitLab CI jobs are run inside Windows Server containers,
using the Custom executor feature of GitLab Runner as Windows Server
2016 is not supported by GitLab Runner's native Docker on Windows
executor and Windows Server 2019 is not yet widely available from
hosting providers.
- The Windows Docker image used by GitLab CI is not stored in the
GitLab Container Registry as it is over 27 GB in size and thus
passing it between GitLab and its runners is impractical.
- There is no vcvarsall.bat variant written in PowerShell and batch
scripts are no longer supported by GitLab Runner Custom executor, so
the environment variables set by vcvarsall.bat are injected back
into the PowerShell environment by processing the output of "set".
- Visual Studio parallel builds are a bit different than "make -jX"
builds as parallelization happens in two tiers: project parallelism
(controlled by the "/maxCpuCount" msbuild.exe switch) and compiler
parallelism (controlled by the "/MP" cl.exe switch). To limit the
total number of compiler processes spawned concurrently to a value
similar to the one used for Unix builds, msbuild.exe is allowed to
build at most 2 projects at once, each of which can spawn up to half
of BUILD_PARALLEL_JOBS worth of compiler processes. Using such
parameters is a fairly arbitrary decision taken to solve the
trade-off between compilation speed and runner load.
- Configuring network addresses in Windows Server containers is
tricky. Adding 10.53.0.1/24 and similar addresses to the vEthernet
interface created by Docker never causes ifconfig.bat to fail, but
in fact only one container can have any given IP address configured
at any given time (the request to add the same address in another
container is silently ignored). Thus, in order to allow multiple
system test jobs to be run in parallel, the addresses used in system
tests are configured on the loopback interfaces. Interestingly
enough, the addresses set on the loopback interfaces... persist
between containers. Fortunately, this is acceptable for the time
being and only requires ifconfig.bat failures to be ignored (as
ifconfig.bat will fail if it attempts to configure an already
existing address on an interface). We also need to wait for a brief
moment after calling ifconfig.bat as the addresses the latter
attempts to configure may not be immediately available after it
returns (and that causes runall.sh to error out). Finally, for some
reason we also need to signal that the DNS servers on each loopback
interface are to be configured using DHCP or else ifconfig.bat will
fail to add the requested addresses.
- Since named.pid files created by named instances used in system
tests contain Windows PIDs instead of Cygwin PIDs and various
versions of Cygwin "kill" react differently when passed Windows PIDs
without the -W switch, all "kill" invocations in GitLab CI need to
use that switch (otherwise they would print error messages which
would cause stop.pl to assume the process being killed died
prematurely). However, to preserve compatibility with older Cygwin
versions used in our other Windows test environments, we alter the
relevant scripts "on the fly" rather than in the Git repository.
- In the containers used for running system tests, Windows Error
Reporting is configured to automatically create crash dumps in
C:\CrashDumps. This directory is examined after the test suite is
run to ensure no crashes went under stop.pl's radar.
The SYSTEMTESTTOP variable is set by bin/tests/system/run.sh. When
system tests are run on Windows, that variable will contain an absolute
Cygwin path. In the case of the "statschannel" system test, using the
unmodified SYSTEMTESTTOP variable in tests.sh causes the RNDCCMD
variable to contain an invocation of a native Windows application with
an absolute Cygwin path passed as a parameter, which prevents rndc from
working in that system test. Until we have a cleaner solution, override
SYSTEMTESTTOP with a relative path to work around the issue and thus fix
the "statschannel" system test on Windows.
Make sure the CYGWIN environment variable is set whenever system tests
are run on Windows to prevent stop.pl from making incorrect assumptions
about the environment it is running in, which triggers e.g. false
reports about named instances crashing on shutdown when system tests are
run on Windows. This issue has not been caught earlier because the
CYGWIN environment variable was incidentally being set on a higher level
in our Windows test environments.
Error reporting for parallel system tests on Windows has been broken all
along: since all parallel.mk targets generated by parallel.sh pipe their
output through "tee", the return code from run.sh is lost and thus
running "make -f parallel.mk check" will not yield a non-zero return
code if some system tests fail. The same applies to runsequential.sh.
Yet, runall.sh on Windows only sets its return code to a non-zero value
if either "make -f parallel.mk check" or runsequential.sh returns a
non-zero return code. Fix by making runall.sh yield a non-zero return
code when testsummary.sh fails, which is the same approach as the one
used in the "test" target in bin/tests/system/Makefile.
Until now, the build process for BIND on Windows involved upgrading the
solution file to the version of Visual Studio used on the build host.
Unfortunately, the executable used for that (devenv.exe) is not part of
Visual Studio Build Tools and thus there is no clean way to make that
executable part of a Windows Server container.
Luckily, the solution upgrade process boils down to just adding XML tags
to Visual Studio project files and modifying certain XML attributes - in
files which we pregenerate anyway using win32utils/Configure. Thus,
extend win32utils/Configure with three new command line parameters that
enable it to mimic what "devenv.exe bind9.sln /upgrade" does. This
makes the devenv.exe build step redundant and thus facilitates building
BIND in Windows Server containers.
Build configuration for the dnssec-cds Visual Studio project is absent
from the solution file template, which means the solution needs to be
upgraded using "devenv bind9.sln /upgrade" in order for the dnssec-cds
project to be built. Add the build configuration for dnssec-cds to the
solution file template so that upgrading the solution is not necessary
for building that project.
named-checkzone does not use libbind9. Update the Visual Studio project
file template for named-checkzone to reflect that, thus preventing
compilation issues during parallel builds.
When commit 8eb88aafee removed liblwres,
it also modified nsupdate to use libirs instead of liblwres, but the
Visual Studio project files were not updated to reflect that change.
Make sure the nsupdate Visual Studio project depends on the libirs
project to prevent compilation issues during parallel builds.
Make stderr fully buffered on Windows to improve named performance when
it is logging to stderr, which happens e.g. in system tests. Note that:
- line buffering (_IOLBF) is unavailable on Windows,
- fflush() is called anyway after each log message gets written to the
default stderr logging channels created by libisc.
BIND system tests are run in a Cygwin environment. Apparently Cygwin
shell sets the SEM_NOGPFAULTERRORBOX bit in its process error mode which
is then inherited by all spawned child processes. This bit prevents the
Windows Error Reporting dialog from being displayed, which I assume is
part of an effort to contain memory handling errors triggered by Cygwin
binaries in the Cygwin environment. Unfortunately, this also prevents
automatic crash dump creation by Windows Error Reporting and Cygwin
itself does not handle memory errors in native Windows processes spawned
from a Cygwin shell.
Fix by clearing the SEM_NOGPFAULTERRORBOX bit inside named if it is
started in a Cygwin environment, thus overriding the Cygwin-set process
error mode in order to enable Windows Error Reporting to handle all
named crashes.
When libxml2 is to be used in a multi-threaded application, the
xmlInitThreads() function must be called before any other libxml2
function. This function does different things on various platforms and
thus one can get away without calling it on Unix systems, but not on
Windows, where it initializes critical section objects used for
synchronizing access to data structures shared between threads. Add the
missing xmlInitThreads() call to prevent crashes on affected systems.
Also add a matching xmlCleanupThreads() call to properly release the
resources set up by xmlInitThreads().
No problems have been observed on the FreeBSD GitLab CI runner during
the burn-in period, when FreeBSD jobs needed to be triggered manually.
Thus, make the FreeBSD jobs run automatically along other GitLab CI
jobs.
`/usr/share/sgml/docbook/xsl-stylesheets` and `/usr/share/dblatex` are
places where docbook-style-xsl and, respectively, dblatex packages on
Red Hat systems put their XSL templates. Unless we hint this place it
has to be added to `./configure` manually (`--with-docbook-xsl=...`):
https://src.fedoraproject.org/rpms/bind/blob/master/f/bind.spec#_691.
On Fedora 30:
Before
```
./configure
...
checking for Docbook-XSL path... auto
checking for html/docbook.xsl... "not found"
checking for xhtml/docbook.xsl... "not found"
checking for manpages/docbook.xsl... "not found"
checking for html/chunk.xsl... "not found"
checking for xhtml/chunk.xsl... "not found"
checking for html/chunktoc.xsl... "not found"
checking for xhtml/chunktoc.xsl... "not found"
checking for html/maketoc.xsl... "not found"
checking for xhtml/maketoc.xsl... "not found"
checking for xsl/docbook.xsl... "not found"
checking for xsl/latex_book_fast.xsl... "not found"
```
After:
```
./configure
...
checking for Docbook-XSL path... auto
checking for html/docbook.xsl... /usr/share/sgml/docbook/xsl-stylesheets/html/docbook.xsl
checking for xhtml/docbook.xsl... /usr/share/sgml/docbook/xsl-stylesheets/xhtml/docbook.xsl
checking for manpages/docbook.xsl... /usr/share/sgml/docbook/xsl-stylesheets/manpages/docbook.xsl
checking for html/chunk.xsl... /usr/share/sgml/docbook/xsl-stylesheets/html/chunk.xsl
checking for xhtml/chunk.xsl... /usr/share/sgml/docbook/xsl-stylesheets/xhtml/chunk.xsl
checking for html/chunktoc.xsl... /usr/share/sgml/docbook/xsl-stylesheets/html/chunktoc.xsl
checking for xhtml/chunktoc.xsl... /usr/share/sgml/docbook/xsl-stylesheets/xhtml/chunktoc.xsl
checking for html/maketoc.xsl... /usr/share/sgml/docbook/xsl-stylesheets/html/maketoc.xsl
checking for xhtml/maketoc.xsl... /usr/share/sgml/docbook/xsl-stylesheets/xhtml/maketoc.xsl
checking for xsl/docbook.xsl... /usr/share/dblatex/xsl/docbook.xsl
checking for xsl/latex_book_fast.xsl... /usr/share/dblatex/xsl/latex_book_fast.xsl
```
* CKR_CRYPTOKI_ALREADY_INITIALIZED: This value can only be returned by
`C_Initialize`. It means that the Cryptoki library has already been
initialized (by a previous call to `C_Initialize` which did not have a
matching `C_Finalize` call).
* CKR_FUNCTION_NOT_SUPPORTED: The requested function is not supported by this
Cryptoki library. Even unsupported functions in the Cryptoki API should have a
“stub” in the library; this stub should simply return the value
CKR_FUNCTION_NOT_SUPPORTED.
* CKR_LIBRARY_LOAD_FAILED: The Cryptoki library could not load a dependent
shared library.
The OASIS pkcs11.h header has a restrictive license. Replace the
pkcs11.h pkcs11f.h and pkcs11t.h headers with pkcs11.h from p11-kit.
For source distribution, the license for the OASIS headers itself
doesn't pose any licensing problem when combined with MPL license, but
it possibly creates problem for downstream distributors of BIND 9.