This PR is a part of the actions necessary to make Certbot-CI work on Windows, in order to execute the integration tests on this platform.
I initially used the fully-fledged HTTP proxy [Traefik](https://docs.traefik.io/) to distribute HTTP challenges among several pytest nodes, and so parallelize the integration tests. Traefik for this purpose is overkill. We just want to redirect the ACME server to a pytest node depending on the `Host` header, and we use here a production-grade HTTP proxy for that.
However it was not a problem on Linux, as soon as you can have Docker, because this instance is deployed through it.
But this becomes a problem for Windows, where Docker is not available everywhere, very compelling on its setup, and limited by the implemented network drivers. See my comments here https://github.com/letsencrypt/pebble/pull/240 for more details.
Hopefully Python ships with everything needed to implement a simple HTTP proxy, with strictly what we need for the parallelization of integration tests.
This PR implements this kind of HTTP proxy, and remove the coupling to Traefik.
This PR has been tested successfully with integration tests on Pebble under Linux for Python 2.x and Python 3.x, and the proxy alone has been also tested successfully on Windows (no integration tests can be run for now on this platform).
* Create a python proxy
* Refactor proxy config
* Working logic
* Resolve from the path
* Give proxy process to the ACMEServer context manager
While I expect stale bot will close out 150 - 250 issues, that'll still leave us with 400+ open issues. My concern is that with a threshold of 6 months, most of these 400 issues will be in the same state 6 months from now and stale bot will annoy people by asking them if their issue is still valid too frequently.
Doubling the stale threshold to 1 year should mitigate this problem a bit I think.
* Turn off session tickets for versions of Nginx that support it
In line with Mozilla's security recommendations.
* Changelog.
* Set version before installing config files
* lint: remove unused import
* windows testfix
* another windows testfix?
* Testing path of updating src file with old nginx
* Fix windows, and make config update tests fail if update doesn't happen
Since #7073 for Certbot and letsencrypt/boulder@3918714 for Boulder have landed, the bash scripts that remained after certbot-ci are not useful anymore outside of Certbot.
Only remaining place is the apacheconftest-with-pebble tox target, which leverages pebble-fetch.py script to expose a running ACME server to the apache-conf-test script.
This PR refactor apacheconftest-with-pebble to use certbot-ci instead. Finally, this PR remove the remaining integration tests bash scripts, that are _common.sh, boulder-fetch.py and pebble-fetch.py.
* Disconnect common and boulder-fetch
* Prepare reconnection of apacheconftest to new pebble deployment logic
* Finish the configuration for apacheconftest
* Add executable flag to python script
* Fix shebang
* Delete pebble-fetch.sh
Currently integration tests against Boulder fail during nightly tests. See https://travis-ci.com/certbot/certbot/builds/115373954.
This is due to a failure to cleanup the workspace associated to the Boulder docker started during the integration tests. Indeed this docker compile several artifacts whose owner is root, and permissions are 0744. These files are persisted in the workspace folder attached to the Docker.
Since tox is run as a non-root user (but this user still have access to the Docker daemon), everything works fine until the end of the test suite, when all resources are cleaned up. At this point, pytest fires a PermissionError when failing to delete these artifacts, return with a non-zero exit code, and so fail the build.
Since this situation could happen outside of the CI, I made appropriate corrections to allow the integration tests to be run as a non-root user, instead of changing Travis to execute tests as root user.
The correction is to add a step to the cleanup process: the deletion of these artifacts through an ad-hoc docker instance.
During review of #6989, we saw that some of our test bash scripts were still used in the Boulder project in particular. It is about `tests/integration/_common.sh` in particular, to expose the `certbot_test` bash function, that is an appropriate way to execute a local version of certbot in test mode: define a custom server, remove several checks, full log and so on.
This PR is an attempt to assert this goal: exposing a new `certbot_test` executable for test purpose. More generally, this PR is about giving well suited scripts to quickly make manual tests against certbot without launching the full automated pytest suite.
The idea here is to leverage the existing logic in certbot-ci, and expose it as executable scripts. This is done thanks to the `console_scripts` entry of setuptools entrypoint feature, that install scripts in the `PATH`, when `pip install` is invoked, that delegate to specific functions in the installed packages.
Two scripts are defined this way:
* `certbot_test`: it executes certbot in test mode in a very similar way than the original `certbot_test` in `_common.sh`, by delegating to `certbot_integration_tests.utils.certbot_call:main`. By default this execution will target a pebble directory url started locally. The url, and also http-01/tls-alpn-01 challenge ports can be configured using ad-hoc environment variables. All arguments passed to `certbot_test` are transferred to the underlying certbot command.
* `acme_server`: it set up a fully running instance of an ACME server, ready for tests (in particular, all FQDN resolves to localhost in order to target a locally running `certbot_test` command) by delegating to `certbot_integration_tests.utils.acme_server:main`. The choice of the ACME server is given by the first parameter passed to `acme_server`, it can be `pebble`, `boulder-v1` or `boulder-v2`. The command keeps running on foreground, displaying the logs of the ACME server on stdout/stderr. The server is shut down and resources cleaned upon entering CTRL+C.
This two commands can be run also through the underlying python modules, that are executable.
Finally, a typical workflow on certbot side to run manual tests would be:
```
cd certbot
tools/venv.py
source venv/bin/activate
acme_server pebble &
certbot_test certonly --standalone -d test.example.com
```
On boulder side it could be:
```
# Follow certbot dev environment setup instructions, then ...
cd boulder
docker-compose run --use-aliases -e FAKE_DNS=172.17.0.1 --service-ports boulder ./start.py
SERVER=http://localhost:4001/directory certbot_test certonly --standalone -d test.example.com
```
* Configure certbot-ci to expose a certbot_test console script calling certbot in test mode against a local pebble instance
* Add a command to start pebble/boulder
* Use explicit start
* Add execution permission to acme_server
* Add a docstring to certbot_test function
* Change executable name
* Increase sleep to 3600s
* Implement a context manager to handle the acme server
* Add certbot_test workspace in .gitignore
* Add documentation
* Remove one function in context, split logic of certbot_test towards capturing non capturing
* Use an explicit an properly configured ACMEServer as handler.
* Add doc. Put constants.
* Modifications for misc
* Add some types
* Use os_rename
* Move rename into filesystem
* Use our os package
* Rename filesystem.rename to filesystem.replace
* Disable globally function redefined lint in os module
With the various optimizations already done and upcoming (certbot-ci), the time execution of integration tests have significantly decreased, allowing potentially a complete execution of a Travis PR job to be done within 5min30s.
However, one job is significantly longer that the other ones after this migration: this is nginx_compat, that takes more that 11min to finish. I tried to split the nginx_compat in terms of tested configuration and of tests to execute (auth, install, enhance). Both are not satisfactory:
splitting by configuration may work, but add a significant complexity in the tests
splitting by tests type is supported almost out-of-the-box, but fails to make two fast tests (see https://travis-ci.org/adferrand/certbot/builds/525892885?utm_source=github_status&utm_medium=notification for instance)
Since these tests are designed to check corner cases on the nginx parser, this is mostly useless to execute them on each PR, as the nginx parser is rarely updated.
After some discussion with @bmw, I think that we can just move the nginx_compat from the PR tests to the nightly tests. This PR does that.
* Revert "Add an option to dns_rfc2136 plugin to specify an authorative base domain. (#7029)"
This reverts commit 5ab6a597b0.
* Update changelog.
(cherry picked from commit 23b52ca1c8)
* Validate OCSP response for responders that are not the certificate's issuer.
* Improve OCSP tests using a issuer/responder pair for OCSP responses
* Clean code
* Update ocsp_test.py
* Add various comments
* Add several cases of ocsp responder. More factories for the resilience tests.
* Update ocsp_test.py
Fixes#6955.
This updates the Fedora version used in our test farm tests to Fedora 30. The AMI ID comes from https://alt.fedoraproject.org/cloud/ where it is listed as their standard HVM AMI for the region we use us-east-1 (US East (N. Virginia)).
Unfortunately, there were a lot of small changes required for this. The big reason for this is on Fedora, there isn't a Python 2 executable installed. In fact, there's not even an executable named python. It's just python3. Rather than installing another Python in each test, I wrote a script that the test scripts can share to figure out the different paths and names that should be used in their script. (This isn't used in test_sdists.sh because the logic is a little different.)
Other changes here worth flagging are:
I changed the name of the variable RUN_PYTHON3_TESTS in test_leauto_upgrades.sh to RUN_RHEL6_TESTS. The tests that are run when this variable is set test the upgrade from Python 2 to Python 3 on RHEL 6. I think this new name is much better now that we also have Fedora running Python 3.
I made tools/simple_http_server.py work on Python 3.
You can see tests passing with these changes at https://travis-ci.com/certbot/certbot/builds/113821476. I also ran test_tests.sh and they passed.
* Update to Fedora 30 in test farm tests.
Fedora 28 is likely to reach its EOL soon.
* Add set_python_envvars.sh.
* Fix test_apache2.sh on python3 only distros.
* Fix test_leauto_upgrades.sh on python3 systems.
* Fix certonly_standalone tests with python3 only
* Fix test_sdists.sh on python3 only distros.
* Make simple_http_server.py work on Python 3.
* add comments
* Ignore editor backups when running hooks.
When processing hooks, certbot also runs editor backups even though
such files are outdated, clearly warranted correction and may quite
possibly be defective.
That behavior could lead to unexpected breakage, and perhaps even pose
security risks---for example, if a previous script was careless with
file permissions. As an aggravating factor, the backup runs after the
corrected version and could unintentionally override a fix the user
thought was properly implemented.
This commit causes editor backup files ending in tilde (~) to be
excluded when running hooks.
Additional information can be found here:
https://github.com/certbot/certbot/issues/7107https://community.letsencrypt.org/t/editor-backup-files-executed-as-renewal-hooks/94750
* Add unit test for hook scripts with filenames ending in tilde.
* Provide changelog entry for not running hook scripts ending in tilde.
* Add Felix Lechner to the list of contributors.
Following discussion in #6947 (comment), I have second thoughts about relying on acme in certbot-ci.
Indeed, I think it is a good design to not rely in tests on the code you are testing. Obviously in unit tests it is very difficult, since most of the time the unit that is tested needs input generated by other part of the code. However it is not really a problem in a unit test, as its purpose is to make assertions about a specific portion of the code, not the others parts.
In the scope of integration tests, the software tested is treated as a black box. In this case, having some parts of the test logic that use in fact part of the code in the black box, increase the risk that some assertions compared two results coming from the same flawed logic from the tested software.
Since using acme in certbot-ci is only saving few lines of code, I think it does not worth the risk and the added complexity to declare acme as a dependency. I prefer to duplicate these lines and keep certbot-ci free of any dependency coming from the certbot project.
You can see the full test suite running at https://travis-ci.com/certbot/certbot/builds/112291892.
A few noteworthy things:
--fast is included because without, the tests would sometimes reach Travis' 50 minute timeout even with 1 test script per Travis build.
The only script that is run at release time which is not being run here is https://github.com/certbot/certbot/blob/master/tests/letstest/scripts/test_tests.sh because that script runs tests on the packages installed by certbot-auto which won't be updated until midway through a release.
We check TRAVIS_PULL_REQUEST and error out if it is not false for simplicity which should be fine because these tests are never run on PRs. The reason it's more complex to run test farm tests on PRs is the test farm tests need a named branch to pull from and Travis effectively merges the PR into the target branch before running tests complicating this.
I don't think this should block this PRs, but the one final change we may want to make to the current setup is #7071.
* Add encrypted private key.
* Add test farm tests to tox and travis.
* Change magic profile name.
* Further split test farm tests.
* Build local branch.
* more depth
* Set LOGDIR at top of script.
* Set sentinel at top of script.
* Don't use EC2 global to block on instance start.
* Remove global boto3 state.
* Pass in boulder_url.
* Create main function.
* Add link to reload docs.