Done by setting `autopilot.min_quorum = 3`.
Techncially, this would have been required to keep the test correct since
Consul's "autopilot" "Dead Server Cleanup" was enabled by default (I believe
that was in Consul 0.8). Practically, the issue only occurred with our NixOS
test with releases >= `1.7.0-beta2` (see #90613). The setting itself is
available since Consul 1.6.2.
However, this setting was not documented clearly enough for anybody to notice,
and only the upstream issue https://github.com/hashicorp/consul/issues/8118
I filed brought that to light.
As explained there, the test could also have been made pass by applying the
more correct rolling reboot procedure
-m.wait_until_succeeds("[ $(consul members | grep -o alive | wc -l) == 5 ]")
+m.wait_until_succeeds(
+ "[ $(consul operator raft list-peers | grep true | wc -l) == 3 ]"
+)
but we also intend to test that Consul can regain consensus even if
the quorum gets temporarily broken.
The systemd socket unit files now more precisely track the IPFS
configuration, by including any multaddr they can make a `ListenStream`
for. (The daemon doesn't currently support anything which would use
`ListDatagram`, so we don't need to worry about that.)
The tests use some of these features.
Reads a bit more naturally, and now the changes to the
acme-${cert}.service actually reflect what would be needed were you to
do the same in production.
e.g. "for dns-01, your service that needs the cert needs to pull in the
cert"
Refactor the systemd service definition for the haproxy reverse proxy,
using the upstream systemd service definition. This allows the service
to be reloaded on changes, preserving existing server state, and adds
some hardening options.
NixOS currently has issues with setting the FQDN of a system in a way
where standard tools work. In order to help with experimentation and
avoid regressions, add a test that checks that the hostname is
reported as the user wanted it to be.
Co-authored-by: Michael Weiss <dev.primeos@gmail.com>
Favor the configuration in "configFile" over "config" to allow
"configFile" to override "config" without a system rebuild.
Add a "persistentKeys" option to generate keys and addresses that
persist across service restarts. This is useful for self-configuring
boot media.
This adds a simple test running GNU Hello cross-compiled for armv7l and
aarch64 inside a x86_64 VM with configured binfmt.
We already build the cross toolchains in other invocations, and building
hello itself is small.
This test is sometimes flaky on hydra as at the time of the `git clone`
the network isn't really configured yet[1]. That problem doesn't seem to
occur locally but if you run it on a machine with high enough load (such
as hydra build machines). Hopefully this will make the test not flaky
anymore.
[1] https://hydra.nixos.org/build/118710378/nixlog/21/raw
This seems to have worked in 15f105d41f (5
months ago) but broke somewhere in the meantime.
The current module doesn't seem to be underdocumented and might need a
serious refactor. It requires quite some hacks to get it to work (see
https://github.com/NixOS/nixpkgs/issues/86305#issuecomment-621129942),
or how the ldap.nix test used systemd.services.openldap.preStart and
made quite some assumptions on internals.
Mic92 agreed on being added as a maintainer for the module, as he uses
it a lot and can possibly fix eventual breakages. For the most basic
startup breakages, the remaining openldap.nix test might suffice.
`doas` is a lighter alternative to `sudo` that "provide[s] 95% of the
features of `sudo` with a fraction of the codebase" [1]. I prefer it to
`sudo`, so I figured I would add a NixOS module in order for it to be
easier to use. The module is based off of the existing `sudo` module.
[1] https://github.com/Duncaen/OpenDoas
This is a follow-up to the PR #82026 that contains the promised tests.
In this test I am testing if we can properly propagate prefixes received
via DHCPv6 PD with the networkd options in our module system.
The comments in the test should be sufficient to follow the idea and
what is going on.
Setting up a XMPP chat server is a pretty deep rabbit whole to jump in
when you're not familiar with this whole universe. Your experience
with this environment will greatly depends on whether or not your
server implements the right set of XEPs.
To tackle this problem, the XMPP community came with the idea of
creating a meta-XEP in charge of listing the desirable XEPs to comply
with. This meta-XMP is issued every year under an new XEP number. The
2020 one being XEP-0423[1].
This prosody nixos module refactoring makes complying with XEP-0423
easier. All the necessary extensions are enabled by default. For some
extensions (MUC and HTTP_UPLOAD), we need some input from the user and
cannot provide a sensible default nixpkgs-wide. For those, we guide
the user using a couple of assertions explaining the remaining manual
steps to perform.
We took advantage of this substential refactoring to refresh the
associated nixos test.
Changelog:
- Update the prosody package to provide the necessary community
modules in order to comply with XEP-0423. This is a tradeoff, as
depending on their configuration, the user might end up not using them
and wasting some disk space. That being said, adding those will
allow the XEP-0423 users, which I expect to be the majority of
users, to leverage a bit more the binary cache.
- Add a muc submodule populated with the prosody muc defaults.
- Add a http_upload submodule in charge of setting up a basic http
server handling the user uploads. This submodule is in is
spinning up an HTTP(s) server in charge of receiving and serving the
user's attachments.
- Advertise both the MUCs and the http_upload endpoints using mod disco.
- Use the slixmpp library in place of the now defunct sleekxmpp for
the prosody NixOS test.
- Update the nixos test to setup and test the MUC and http upload
features.
- Add a couple of assertions triggered if the setup is not xep-0423
compliant.
[1] https://xmpp.org/extensions/xep-0423.html
When testing WireGuard updates, I usually run the VM-tests with
different kernels to make sure we're not introducing accidental
regressions for e.g. older kernels.
I figured that we should automate this process to ensure continuously
that WireGuard works fine on several kernels.
For now I decided to test the latest LTS version (5.4) and
the latest kernel (currently 5.6). We can add more kernels in the
future, however this seems to significantly slow down evaluation and
time.
The list can be customized by running a command like this:
nix-build nixos/tests/wireguard --arg kernelVersionsToTest '["4.19"]'
The `kernelPackages` argument in the tests is null by default to make
sure that it's still possible to invoke the test-files directly. In that
case the default kernel of NixOS (currently 5.4) is used.
The elasticsearch-curator was not deleting indices because the indices
had ILM policies associated with them. This is now fixed by
configuring the elasticsearch-curator with `allow_ilm_indices: true`.
Also see: https://github.com/elastic/curator/issues/1490
Enables multi-site configurations.
This break compatibility with prior configurations that expect options
for a single dokuwiki instance in `services.dokuwiki`.
Shimming out the Let's Encrypt domain name to reuse client configuration
doesn't work properly (Pebble uses different endpoint URL formats), is
recommended against by upstream,[1] and is unnecessary now that the ACME
module supports specifying an ACME server. This commit changes the tests
to use the domain name acme.test instead, and renames the letsencrypt
node to acme to reflect that it has nothing to do with the ACME server
that Let's Encrypt runs. The imports are renamed for clarity:
* nixos/tests/common/{letsencrypt => acme}/{common.nix => client}
* nixos/tests/common/{letsencrypt => acme}/{default.nix => server}
The test's other domain names are also adjusted to use *.test for
consistency (and to avoid misuse of non-reserved domain names such
as standalone.com).
[1] https://github.com/letsencrypt/pebble/issues/283#issuecomment-545123242
Co-authored-by: Yegor Timoshenko <yegortimoshenko@riseup.net>
This was added in aade4e577b, but the
implementation of the ACME module has been entirely rewritten since
then, and the test seems to run fine on AArch64.
linux-hardened sets kernel.unprivileged_userns_clone=0 by default; see
anthraxx/linux-hardened@104f44058f.
This allows the Nix sandbox to function while reducing the attack
surface posed by user namespaces, which allow unprivileged code to
exercise lots of root-only code paths and have lead to privilege
escalation vulnerabilities in the past.
We can safely leave user namespaces on for privileged users, as root
already has root privileges, but if you're not running builds on your
machine and really want to minimize the kernel attack surface then you
can set security.allowUserNamespaces to false.
Note that Chrome's sandbox requires either unprivileged CLONE_NEWUSER or
setuid, and Firefox's silently reduces the security level if it isn't
allowed (see about:support), so desktop users may want to set:
boot.kernel.sysctl."kernel.unprivileged_userns_clone" = true;
* nixos/k3s: simplify config expression
* nixos/k3s: add config assertions and trim unneeded bits
* nixos/k3s: add a test that k3s works; minor module improvements
This is a single-node test. Eventually we should also have a multi-node
test to verify the agent bit works, but that one's more involved.
* nixos/k3s: add option description
* nixos/k3s: add defaults for token/serveraddr
Now that the assertion enforces their presence, we dont' need to use the typesystem for it.
* nixos/k3s: remove unneeded sudo in test
* nixos/k3s: add to test list
For reasons yet unknown, the vxlan backend doesn't work (at least inside
the qemu networking), so this is moved to the udp backend.
Note changing the backend apparently also changes the interface name,
it's now `flannel0`, not `flannel.1`
fixes #74941
This was whitespace-sensitive, kept fighting with my editor and broke
the tests easily. To fix this, let python convert the output to
individual lines, and strip whitespace from them before comparing.
Also removed `pkgs.hydra-flakes` since flake-support has been merged
into master[1]. Because of that, `pkgs.hydra-unstable` is now compiled
against `pkgs.nixFlakes` and currently requires a patch since Hydra's
master doesn't compile[2] atm.
[1] https://github.com/NixOS/hydra/pull/730
[2] https://github.com/NixOS/hydra/pull/732
Upgrades Hydra to the latest master/flake branch. To perform this
upgrade, it's needed to do a non-trivial db-migration which provides a
massive performance-improvement[1].
The basic ideas behind multi-step upgrades of services between NixOS versions
have been gathered already[2]. For further context it's recommended to
read this first.
Basically, the following steps are needed:
* Upgrade to a non-breaking version of Hydra with the db-changes
(columns are still nullable here). If `system.stateVersion` is set to
something older than 20.03, the package will be selected
automatically, otherwise `pkgs.hydra-migration` needs to be used.
* Run `hydra-backfill-ids` on the server.
* Deploy either `pkgs.hydra-unstable` (for Hydra master) or
`pkgs.hydra-flakes` (for flakes-support) to activate the optimization.
The steps are also documented in the release-notes and in the module
using `warnings`.
`pkgs.hydra` has been removed as latest Hydra doesn't compile with
`pkgs.nixStable` and to ensure a graceful migration using the newly
introduced packages.
To verify the approach, a simple vm-test has been added which verifies
the migration steps.
[1] https://github.com/NixOS/hydra/pull/711
[2] https://github.com/NixOS/nixpkgs/pull/82353#issuecomment-598269471
While our ETag patch works pretty fine if it comes to serving data off
store paths, it unfortunately broke something that might be a bit more
common, namely when using regexes to extract path components of
location directives for example.
Recently, @devhell has reported a bug with a nginx location directive
like this:
location ~^/\~([a-z0-9_]+)(/.*)?$" {
alias /home/$1/public_html$2;
}
While this might look harmless at first glance, it does however cause
issues with our ETag patch. The alias directive gets broken up by nginx
like this:
*2 http script copy: "/home/"
*2 http script capture: "foo"
*2 http script copy: "/public_html/"
*2 http script capture: "bar.txt"
In our patch however, we use realpath(3) to get the canonicalised path
from ngx_http_core_loc_conf_s.root, which returns the *configured* value
from the root or alias directive. So in the example above, realpath(3)
boils down to the following syscalls:
lstat("/home", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("/home/$1", 0x7ffd08da6f60) = -1 ENOENT (No such file or directory)
During my review[1] of the initial patch, I didn't actually notice that
what we're doing here is returning NGX_ERROR if the realpath(3) call
fails, which in turn causes an HTTP 500 error.
Since our patch actually made the canonicalisation (and thus additional
syscalls) necessary, we really shouldn't introduce an additional error
so let's - at least for now - silently skip return value if realpath(3)
has failed.
However since we're using the unaltered root from the config we have
another issue, consider this root:
/nix/store/...-abcde/$1
Calling realpath(3) on this path will fail (except if there's a file
called "$1" of course), so even this fix is not enough because it
results in the ETag not being set to the store path hash.
While this is very ugly and we should fix this very soon, it's not as
serious as getting HTTP 500 errors for serving static files.
I added a small NixOS VM test, which uses the example above as a
regression test.
It seems that my memory is failing these days, since apparently I *knew*
about this issue since digging for existing issues in nixpkgs, I found
this similar pull request which I even reviewed:
https://github.com/NixOS/nixpkgs/pull/66532
However, since the comments weren't addressed and the author hasn't
responded to the pull request, I decided to keep this very commit and do
a follow-up pull request.
[1]: https://github.com/NixOS/nixpkgs/pull/48337
Signed-off-by: aszlig <aszlig@nix.build>
Reported-by: @devhell
Acked-by: @7c6f434c
Acked-by: @yorickvP
Merges: https://github.com/NixOS/nixpkgs/pull/80671
Fixes: https://github.com/NixOS/nixpkgs/pull/66532
Dropbear lags behind OpenSSH significantly in both support for modern
key formats like `ssh-ed25519`, let alone the recently-introduced
U2F/FIDO2-based `sk-ssh-ed25519@openssh.com` (as I found when I switched
my `authorizedKeys` over to it and promptly locked myself out of my
server's initrd SSH, breaking reboots), as well as security features
like multiprocess isolation. Using the same SSH daemon for stage-1 and
the main system ensures key formats will always remain compatible, as
well as more conveniently allowing the sharing of configuration and
host keys.
The main reason to use Dropbear over OpenSSH would be initrd space
concerns, but NixOS initrds are already large (17 MiB currently on my
server), and the size difference between the two isn't huge (the test's
initrd goes from 9.7 MiB to 12 MiB with this change). If the size is
still a problem, then it would be easy to shrink sshd down to a few
hundred kilobytes by using an initrd-specific build that uses musl and
disables things like Kerberos support.
This passes the test and works on my server, but more rigorous testing
and review from people who use initrd SSH would be appreciated!
This mirrors the behaviour of systemd - It's udev that parses `.link`
files, not `systemd-networkd`.
This was originally applied in 36ef112a47,
but was reverted due to 1115959a8d causing
evaluation errors on hydra.
...even when networkd is disabled
This reverts commit ce78f3ac70, reversing
changes made to dc34da0755.
I'm sorry; Hydra has been unable to evaluate, always returning
> error: unexpected EOF reading a line
and I've been unable to reproduce the problem locally. Bisecting
pointed to this merge, but I still can't see what exactly was wrong.
- Fix misspelled option. mkRenamedOptionModule is not used because the
option hasn't really worked before.
- Add missing cfg.telemetryPath arg to ExecStart.
- Fix mkdir invocation in test.
Add a cage module to nixos. This can be used to make kiosk-style
systems that boot directly to a single application. The user (demo by
default) is automatically logged in by this service and the
program (xterm by default) is automatically started.
This is useful for some embedded, single-user systems where we want
automatic booting. To keep the system secure, the user should have
limited privileges.
Based on the service provided in the Cage wiki here:
https://github.com/Hjdskes/cage/wiki/Starting-Cage-on-boot-with-systemd
Co-Authored-By: Florian Klink <flokli@flokli.de>
* prometheus-nginx-exporter: 0.5.0 -> 0.6.0
* nixos/prometheus-nginx-exporter: update for 0.6.0
Added new option constLabels and updated virtualHost name in the
exporter's test.
This is useful when buildLayeredImage is called in a generic way
that should allow simple (base) images to be built, which may not
reference any store paths.
I am not sure how this ever passed on hydra but 30s is barely enough to
pass the configure phase of opensmtpd. It is likely the package was
built as part of another jobset. Whenever it is built as part of the
test execution the timeout propagates and 30s is clearly not enough for
that.
The subtest was mainly written to demonstrate the VRF-issues with a
5.x-kernel. However this breaks the entire test now as we have 5.4 as
default kernel. Disabling the test for now, I still need to find some
time to investigate.