Adds a function to wait for a new QMP event with a model filter
so that you can expect specific type of events with specific payloads.
e.g. a guest-reset-induced shutdown event.
Building a python environment with python3Minimal requires hydra
to bootstrap pip and build all packages used in the environment
which would otherwise not be built. This reduces cache re-use and duplicates things.
Also common dependencies normally included in python itself
are not properly checked and can cause hard to debug errors
because everyone just assumes those modules are there.
Aliases exist for a reason. Sure it is nice to make sure that
some aliases aren't used within Nixpkgs, but this creates two problems
which are far worse than your failing to meet your neatness compulsions.
- Users encounter missing attributes, https://github.com/NixOS/nixpkgs/issues/264577
wasting their time, stalling their progress, and even occupying others
time that would be better spent on fixing *real* issues.
- Hydra doesn't treat evaluation errors seriously enough, with the
effect that actual relevant test failures are masked by evaluation
failures such as those caused by this no aliases business.
- We don't even have the infrastructure to get rid of aliases, because
all warnings in package attributes are disallowed by Nixpkgs CI
tooling, last I checked.
Before re-disabling this, make sure that
- An actually helpful deprecation process is in place.
- Aliases are still allowed when `nixos-lib.runTests` and
`pkgs.testers.runNixOSTest` are invoked by external projects.
For instance, `all-tests.nix` could provide such an
override (e.g. with `newScope`).
This is a fixup for c1ae82f448.
nix' `passAsFile` does not create empty files for variables that are
`null`.
This results in the following error for units that have no overrides or
content, but are, e.g. `wantedBy`:
`mv: cannot stat '': No such file or directory`.
Minimal reproducer:
`systemd.units.empty.wantedBy = [ "multi-user.target" ];`
This is often necessary when a unit is loaded in via `systemd.packages`.
From now on, we will aim to ensure that the test driver
gets tested by OfBorg using all our available tests.
This commit adds the driver timeout test to the driver.
For `testBuildFailure` and similar functions, we need a full blown derivation and not a lazy one.
This is an internal option for test framework developers.
Since the debut of the test-driver, we didn't obtain
a race timer with the test execution to ensure that tests doesn't run beyond
a certain amount of time.
This is particularly important when you are running into hanging tests
which cannot be detected by current facilities (requires more pvpanic wiring up, QMP
API stuff, etc.).
Two easy examples:
- Some QEMU tests may get stuck in some situation and run for more than 24 hours → we default to 1 hour max.
- Some QEMU tests may panic in the wrong place, e.g. UEFI firmware or worse → end users can set a "reasonable" amount of time
And then, we should let the retry logic retest them until they succeed and adjust
their global timeouts.
Of course, this does not help with the fact that the timeout may need to be
a function of the actual busyness of the machine running the tests.
This is only one step towards increased reliability.
Now that we have a QMP client, we can wire it up in the test driver.
For now, it is almost completely useless because of the need of a constant "event loop", especially
for event listening.
In the next commits, we will slowly enable more and more usecases.
For some time now the attrset returned by `evalModules` has
`type = "configuration"`.
This is a clean refactor because the name is not exposed.
(never is for simple lambda)
When listening on unix sockets, it doesn't make sense to specify a port
for nginx's listen directive.
Since nginx defaults to port 80 when the port isn't specified (but the
address is), we can change the default for the option to null as well
without changing any behaviour.
This also makes configuration available if you just run those tools locally.
Also use ruff instead of pylint because it's faster and more
comprehensive.
Since 008f9f0cd4
("nixos/test-driver: actually use the backdoor message to wait for backdoor"),
when boot is still computering, we can get a tons of empty strings in response to the shell.
This is not really useful to print and waste the disk space for any CI system that logs them.
We stop logging chunks whenever they are empty.
While working on #192270, I noticed that only some wait_for_* helper
functions make the timeout configurable. I think we should be able to
customize it in all cases
https://git.sr.ht/~c00w/nixpkgs/tree/sdimagebtrfs/item/nixos/lib/make-btrfs-fs.nix
I made only one change which was to use `btrfs check` instead of
`fsck.btrfs` because of this warning
```
btrfs-fs.img> ++ fsck.btrfs /nix/store/6d46rc768c140asy6rjpc5rk568r36zq-btrfs-fs.img
btrfs-fs.img> If you wish to check the consistency of a BTRFS filesystem or
btrfs-fs.img> repair a damaged filesystem, see btrfs(8) subcommand 'check'.
```
Co-authored-by: Colin L Rice <colin@daedrum.net>
This commit is a fixup for a regression introduced by
0bdba6c99b.
Before the regression, it was possible to build images without grub or a
kernel (e.g. to boot other kernels with qemu -kernel.
After the regression, such images fail to build. Since
config.boog.loader.grub.enable is false in that scenario, grub.device is
emptystring. While this happens not to be an issue of `ln`, `dirname`
fails on emptystring.
With this change, we guard both commands to only be run when grub is
actually enabled. Images with and without grub succesfully build with
this change.
New EDK2 sets up the backdoor port as a serial console, which feeds the test driver
a bunch of boot logs it can safely ignore. Do so by waiting for the message the
backdoor shell prints before doing anything else.
According to systemd.netdev manpage:
```
MACAddress=
Specifies the MAC address to use for the device, or takes the special value "none". When "none", systemd-networkd does not request the MAC address for
the device, and the kernel will assign a random MAC address. For "tun", "tap", or "l2tp" devices, the MACAddress= setting in the [NetDev] section is
not supported and will be ignored. Please specify it in the [Link] section of the corresponding systemd.network(5) file. If this option is not set,
"vlan" device inherits the MAC address of the master interface. For other kind of netdevs, if this option is not set, then the MAC address is
generated based on the interface name and the machine-id(5).
Note, even if "none" is specified, systemd-udevd will assign the persistent MAC address for the device, as 99-default.link has
MACAddressPolicy=persistent. So, it is also necessary to create a custom .link file for the device, if the MAC address assignment is not desired.
```
Therefore, `none` is an acceptable value.
When lib overrides were used, before this commit, they would not be made
available in the configuration evaluation of nixosTest's nodes.
Sample code:
``` nix
let
pkgs = import ./. {
overlays = [
(new: old: {
lib = old.lib.extend (self: super: {
sorry_dave = builtins.trace "There are no pod bay doors" "sorry dave";
});
})
];
};
in
pkgs.testers.nixosTest {
name = "demo lib overlay";
nodes = {
machine = { lib, ... }: {
environment.etc."got-lib-overlay".text = lib.sorry_dave;
};
};
testScript = { nodes }:
''
start_all()
machine.succeed('grep dave /etc/got-lib-overlay')
'';
}
```
By some miracle, before, it was possible to reconnect to the `node1` without
doing any relevant dance.
But now we are direct booting (¿), it seems like we need to do the right things.
This introduces a `check_output` flag for `execute` because we do not want to steal the
messages from the backdoor service as we might execute the kexec too fast compared
to when we will reconnect.
Therefore, we will let the message in the pipe if needed.
This change removes the bespoke logic around identifying block devices.
Instead of trying to find the right device by iterating over
`qemu.drives` and guessing the right partition number (e.g.
/dev/vda{1,2}), devices are now identified by persistent names provided
by udev in /dev/disk/by-*.
Before this change, the root device was formatted on demand in the
initrd. However, this makes it impossible to use filesystem identifiers
to identify devices. Now, the formatting step is performed before the VM
is started. Because some tests, however, rely on this behaviour, a
utility function to replace this behaviour in added in
/nixos/tests/common/auto-format-root-device.nix.
Devices that contain neither a partition table nor a filesystem are
identified by their hardware serial number which is injecetd via QEMU
(and is thus persistent and predictable). PCI paths are not a reliably
way to identify devices because their availability and numbering depends
on the QEMU machine type.
This change makes the module more robust against changes in QEMU and the
kernel (non-persistent device naming) and by decoupling abstractions
(i.e. rootDevice, bootPartition, and bootLoaderDevice) enables further
improvement down the line.