Closes #106565
When generating multiple certificates which all
share the same server + email, lego will attempt
to create an account multiple times. By adding an
account creation target certificates which share
an account will wait for one service (chosen at
config build time) to complete first.
Attempting to reuse keys on a basis different to the cert (AKA,
storing the key in a directory with a hashed name different to
the cert it is associated with) was ineffective since when
"lego run" is used it will ALWAYS generate a new key. This causes
issues when you revert changes since your "reused" key will not
be the one associated with the old cert. As such, I tore out the
whole keyDir implementation.
As for the race condition, checking the mtime of the cert file
was not sufficient to detect changes. In testing, selfsigned
and full certs could be generated/installed within 1 second of
each other. cmp is now used instead.
Also, I removed the nginx/httpd reload waiters in favour of
simple retry logic for the curl-based tests
Testing of certs failed randomly when the web server was still
returning old certs even after the reload was "complete". This was
because the reload commands send process signals and do not wait
for the worker processes to restart. This commit adds log watchers
which wait for the worker processes to be restarted.
- Use an acme user and group, allow group override only
- Use hashes to determine when certs actually need to regenerate
- Avoid running lego more than necessary
- Harden permissions
- Support "systemctl clean" for cert regeneration
- Support reuse of keys between some configuration changes
- Permissions fix services solves for previously root owned certs
- Add a note about multiple account creation and emails
- Migrate extraDomains to a list
- Deprecate user option
- Use minica for self-signed certs
- Rewrite all tests
I thought of a few more cases where things may go wrong,
and added tests to cover them. In particular, the web server
reload services were depending on the target - which stays alive,
meaning that the renewal timer wouldn't be triggering a reload
and old certs would stay on the web servers.
I encountered some problems ensuring that the reload took place
without accidently triggering it as part of the test. The sync
commands I added ended up being essential and I'm not sure why,
it seems like either node.succeed ends too early or there's an
oddity of the vm's filesystem I'm not aware of.
- Fix duplicate systemd rules on reload services
Since useACMEHost is not unique to every vhost, if one cert
was reused many times it would create duplicate entries in
${server}-config-reload.service for wants, before and
ConditionPathExists
Reads a bit more naturally, and now the changes to the
acme-${cert}.service actually reflect what would be needed were you to
do the same in production.
e.g. "for dns-01, your service that needs the cert needs to pull in the
cert"
Shimming out the Let's Encrypt domain name to reuse client configuration
doesn't work properly (Pebble uses different endpoint URL formats), is
recommended against by upstream,[1] and is unnecessary now that the ACME
module supports specifying an ACME server. This commit changes the tests
to use the domain name acme.test instead, and renames the letsencrypt
node to acme to reflect that it has nothing to do with the ACME server
that Let's Encrypt runs. The imports are renamed for clarity:
* nixos/tests/common/{letsencrypt => acme}/{common.nix => client}
* nixos/tests/common/{letsencrypt => acme}/{default.nix => server}
The test's other domain names are also adjusted to use *.test for
consistency (and to avoid misuse of non-reserved domain names such
as standalone.com).
[1] https://github.com/letsencrypt/pebble/issues/283#issuecomment-545123242
Co-authored-by: Yegor Timoshenko <yegortimoshenko@riseup.net>
The recent custom endpoint addition allows us to directly point
certbot to the custom Pebble directory endpoint.
Thanks to that, we can ditch the Pebble patch we were using so far;
making this test maintenance easier.
Let's encrypt bumped ACME to V2. We need to update our nixos test to
be compatible with this new protocol version.
We decided to drop the Boulder ACME server in favor of the more
integration test friendly Pebble.
- overriding cacert not necessary
- this avoids rebuilding lots of packages needlessly
- nixos/tests/acme: use pebble's ca for client tests
- pebble always generates its own ca which has to be fetched
TODO: write proper commit msg :)
* nixos/acme: Fix ordering of cert requests
When subsequent certificates would be added, they would
not wake up nginx correctly due to target units only being triggered
once. We now added more fine-grained systemd dependencies to make sure
nginx always is aware of new certificates and doesn't restart too early
resulting in a crash.
Furthermore, the acme module has been refactored. Mostly to get
rid of the deprecated PermissionStartOnly systemd options which were
deprecated. Below is a summary of changes made.
* Use SERVICE_RESULT to determine status
This was added in systemd v232. we don't have to keep track
of the EXITCODE ourselves anymore.
* Add regression test for requesting mutliple domains
* Deprecate 'directory' option
We now use systemd's StateDirectory option to manage
create and permissions of the acme state directory.
* The webroot is created using a systemd.tmpfiles.rules rule
instead of the preStart script.
* Depend on certs directly
By getting rid of the target units, we make sure ordering
is correct in the case that you add new certs after already
having deployed some.
Reason it broke before: acme-certificates.target would
be in active state, and if you then add a new cert, it
would still be active and hence nginx would restart
without even requesting a new cert. Not good! We
make the dependencies more fine-grained now. this should fix that
* Remove activationDelay option
It complicated the code a lot, and is rather arbitrary. What if
your activation script takes more than activationDelay seconds?
Instead, one should use systemd dependencies to make sure some
action happens before setting the certificate live.
e.g. If you want to wait until your cert is published in DNS DANE /
TLSA, you could create a unit that blocks until it appears in DNS:
```
RequiredBy=acme-${cert}.service
After=acme-${cert}.service
ExecStart=publish-wait-for-dns-script
```
In 0c7c1660f7 I have set allowSubstitutes
to false, which avoided the substitution of the certificates.
Unfortunately substitution may still happen later when the certificate
is merged with the CA bundle. So the merged CA bundle might be
substituted from a binary cache but the certificate itself is built
locally, which could result in a different certificate in the bundle.
So instead of adding just yet another workaround, I've now hardcoded all
the certificates and keys in a separate file. This also moves
letsencrypt.nix into its own directory so we don't mess up
nixos/tests/common too much.
This was long overdue and should finally make the dependency graph for
the ACME test more deterministic.
Signed-off-by: aszlig <aszlig@nix.build>
Quoting from @FRidh:
Note overridePythonAttrs exists since 17.09. It overrides the call to
buildPythonPackage.
While it's not strictly necessary to do this, because postPatch ends up
in drvAttrs anyway, it's probably better to use overridePythonAttrs so
we don't run into problems when the underlying implementation of
buildPythonPackage changes.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Since 67651d80bc the requests package now
depends on certifi, which in turn provides the CA root certificates that
we need to replace.
It might also be a good idea to actually patch certifi with our version
of cacert by default so that if we want to override and/or add something
we only need to do it once.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Cc: @fpletz, @k0ral, @FRidh
The test here is pretty basic and only tests nginx, but it should get us
started to write tests for different webservers and different ACME
implementations.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>