This module permits to preload Docker image in a VM in order to reduce
OIs on file copies. This module has to be only used in testing
environments, when the test requires several Docker images such as in
Kubernetes tests. In this case,
`virtualisation.dockerPreloader.images` can replace the
`services.kubernetes.kubelet.seedDockerImages` options.
The idea is to populate the /var/lib/docker directory by mounting qcow
files (we uses qcow file to avoid permission issues) that contain images.
For each image specified in
config.virtualisation.dockerPreloader.images:
1. The image is loaded by Docker in a VM
2. The resulting /var/lib/docker is written to a QCOW file
This set of QCOW files can then be used to populate the
/var/lib/docker:
1. Each QCOW is mounted in the VM
2. Symlink are created from these mount points to /var/lib/docker
3. A /var/lib/docker/image/overlay2/repositories.json file is generated
4. The docker daemon is started.
`services.virtualisation.libvirtd.onShutdown` was previously unused.
While suspending a domain on host shutdown is the default, this commit
makes it so domains can be shut down, also.
Previously, setting "privateNetwork = true" without specifying host and
local addresses would create unconfigured interfaces: ve-$INSTANCE on the host
and eth0 inside the container.
These changes is rebased part of the original PR #3021.
100GB breaks cptofs but 50GB is fine and benchmarks shows it takes the same time as building the demo VBox VM with a 10GB disk
+ enabled VM sound output by default
+ set USB controller in USB2.0 mode
+ add manifest file in the OVA as it allows integrity checking on imports
* Lets container@.service be activated by machines.target instead of
multi-user.target
According to the systemd manpages, all containers that are registered
by machinectl, should be inside machines.target for easy stopping
and starting container units altogether
* make sure container@.service and container.slice instances are
actually located in machine.slice
https://plus.google.com/112206451048767236518/posts/SYAueyXHeEX
See original commit: https://github.com/NixOS/systemd/commit/45d383a3b8
* Enable Cgroup delegation for nixos-containers
Delegate=yes should be set for container scopes where a systemd instance
inside the container shall manage the hierarchies below its own cgroup
and have access to all controllers.
This is equivalent to enabling all accounting options on the systemd
process inside the system container. This means that systemd inside
the container is responsible for managing Cgroup resources for
unit files that enable accounting options inside. Without this
option, units that make use of cgroup features within system
containers might misbehave
See original commit: https://github.com/NixOS/systemd/commit/a931ad47a8
from the manpage:
Turns on delegation of further resource control partitioning to
processes of the unit. Units where this is enabled may create and
manage their own private subhierarchy of control groups below the
control group of the unit itself. For unprivileged services (i.e.
those using the User= setting) the unit's control group will be made
accessible to the relevant user. When enabled the service manager
will refrain from manipulating control groups or moving processes
below the unit's control group, so that a clear concept of ownership
is established: the control group tree above the unit's control
group (i.e. towards the root control group) is owned and managed by
the service manager of the host, while the control group tree below
the unit's control group is owned and managed by the unit itself.
Takes either a boolean argument or a list of control group
controller names. If true, delegation is turned on, and all
supported controllers are enabled for the unit, making them
available to the unit's processes for management. If false,
delegation is turned off entirely (and no additional controllers are
enabled). If set to a list of controllers, delegation is turned on,
and the specified controllers are enabled for the unit. Note that
additional controllers than the ones specified might be made
available as well, depending on configuration of the containing
slice unit or other units contained in it. Note that assigning the
empty string will enable delegation, but reset the list of
controllers, all assignments prior to this will have no effect.
Defaults to false.
Note that controller delegation to less privileged code is only safe
on the unified control group hierarchy. Accordingly, access to the
specified controllers will not be granted to unprivileged services
on the legacy hierarchy, even when requested.
The following controller names may be specified: cpu, cpuacct, io,
blkio, memory, devices, pids. Not all of these controllers are
available on all kernels however, and some are specific to the
unified hierarchy while others are specific to the legacy hierarchy.
Also note that the kernel might support further controllers, which
aren't covered here yet as delegation is either not supported at all
for them or not defined cleanly.
When logging into a container by using
nixos-container root-login
all nix-related commands in the container would fail, as they
tried to modify the nix db and nix store, which are mounted
read-only in the container. We want nixos-container to not
try to modify the nix store at all, but instead delegate
any build commands to the nix daemon of the host operating system.
This already works for non-root users inside a nixos-container,
as it doesn't 'own' the nix-store, and thus defaults
to talking to the daemon socket at /nix/var/nix/daemon-socket/,
which is bind-mounted to the host daemon-socket, causing all nix
commands to be delegated to the host.
However, when we are the root user inside the container, we have the
same uid as the nix store owner, eventhough it's not actually
the same root user (due to user namespaces). Nix gets confused,
and is convinced it's running in single-user mode, and tries
to modify the nix store directly instead.
By setting `NIX_REMOTE=daemon` in `/etc/profile`, we force nix
to operate in multi-user mode, so that it will talk to the host
daemon instead, which will modify the nix store for the container.
This fixes #40355
Several service definitions used `mkEnableOption` with text starting
with "Whether to", which produced funny option descriptions like
"Whether to enable Whether to run the rspamd daemon..".
This commit corrects this, and adds short descriptions of services
to affected service definitions.
This reverts commit f777d2b719.
cc #34409
This breaks evaluation of the tested job:
attribute 'diskInterface' missing, at /nix/store/5k9kk52bv6zsvsyyvpxhm8xmwyn2yjvx-source/pkgs/build-support/vm/default.nix:316:24
10G is not enough for a desktop installation, and resizing a Virtualbox disk image is a pain.
Let's increase the default disk size to 100G. It does not require more storage space, since the empty bits are left out.
And don't need to source the uevent files anymore either since $MAJOR
or $MINOR aren't used elsewhere.
[dezgeg: The reason these are no longer needed is that 0d27df280f
switched /tmp to a devtmpfs which automatically creates such device
nodes]
This reverts commit 095fe5b43d.
Pointless renames considered harmful. All they do is force people to
spend extra work updating their configs for no benefit, and hindering
the ability to switch between unstable and stable versions of NixOS.
Like, what was the value of having the "nixos." there? I mean, by
definition anything in a NixOS module has something to do with NixOS...
* nixos/virtualbox: Adds more options to virtualbox-image.nix
Previously you could only set the size of the disk.
This change adds the ability to change the amount of memory
that the image gets, along with the name / derivation name /
file name for the VM.
* Incorporates some review feedback
Regression introduced by d4468bedb5.
No systemd messages are shown anymore during VM test runs, which is not
very helpful if you want to find out about failures.
There is a bit of a conflict between testing and the change that
introduced the regression. While the mentioned commit makes sure that
the primary console is tty0 for virtualisation.graphics = false, our VM
tests need to have the serial console as primary console.
So in order to support both, I added a new virtualisation.qemu.consoles
option, which allows to specify those options using the module system.
The default of this option is to use the changes that were introduced
and in test-instrumentation.nix we use only the serial console the same
way as before.
For test-instrumentation.nix I didn't add a baudrate to the serial
console because I can't find a reason on top of my head why it should
need it. There also wasn't a reason stated when that was introduced in
7499e4a5b9.
Signed-off-by: aszlig <aszlig@nix.build>
Cc: @flokli, @dezgeg, @edolstra
Always enable both tty and serial console, but set preferred console
depending on cfg.graphical.
Even in qemu graphical mode, you can switch to the serial console via
Ctrl+Alt+3.
With that being done, you also don't need to specify
`systemd.services."serial-getty@ttyS0".enable = true;` either as described in
https://nixos.wiki/wiki/Cheatsheet#Building_a_service_as_a_VM_.28for_testing.29,
as systemd automatically spawns a getty on consoles passwd via cmdline.
This also means, vms built by 'nixos-rebuild build-vm' can simply be run
properly in nographic mode by appending `-nographic` to `result/bin/run-*-vm`,
without the need to explicitly add platform-specific QEMU_KERNEL_PARAMS.
The ability to specify "-drive if=scsi" has been removed in QEMU version
2.12 (introduced in 3e3b39f173).
Quote from https://wiki.qemu.org/ChangeLog/2.12#Incompatible_changes:
> The deprecated way of configuring SCSI devices with "-drive if=scsi"
> on x86 has been removed. Use an appropriate SCSI controller together
> "-device scsi-hd" or "-device scsi-cd" and a corresponding "-blockdev"
> parameter instead.
So whenever the diskInterface is "scsi" we use the new way to specify
the drive and fall back to the deprecated way for the time being. The
reason why I'm not using the new way for "virtio" and "ide" as well is
because there is no simple generic way anymore to specify these.
This also turns the type of the virtualisation.qemu.diskInterface option
to be an enum, so the user knows which values are allowed but we can
also make sure the right value is provided to prevent typos.
I've tested this against a few non-disk-related NixOS VM tests but also
the installer.grub1 test (because it uses "ide" as its drive interface),
the installer.simple test (just to be sure it still works with
"virtio") and all the tests in nixos/tests/boot.nix.
In order to be able to run the grub1 test I had to go back to
8b1cf100cd (which is a known commit where
that test still works) and apply the QEMU update and this very commit,
because right now the test is broken.
Apart from the tests here in nixpkgs, I also ran another[1] test in
another repository which uses the "scsi" disk interface as well (in
comparison to most of the installer tests, this one actually failed
prior to this commit).
All of them now succeed.
[1]: 9b5a119972/tests/system/kernel/bfq.nix
Signed-off-by: aszlig <aszlig@nix.build>
Cc: @edostra, @grahamc, @dezgeg, @abbradar, @ts468
mke2fs has this annoying property that it uses getrandom() to get random
numbers (for whatever purposes) which blocks until the kernel's secure
RNG has sufficient entropy, which it usually doesn't in the early boot
(except if your CPU supports RDRAND) where we may need to create the
root disk.
So let's give the VM a virtio RNG to avoid the boot getting stuck at
mke2fs.
Allow out of band communication between qemu VMs and the host.
Useful to retrieve IPs of VMs from the host (for instance when libvirt can't analyze
DHCP requests because VMs are configured with static addresses or when
there is connectivity default).
Following legacy packing conventions, `isArm` was defined just for
32-bit ARM instruction set. This is confusing to non packagers though,
because Aarch64 is an ARM instruction set.
The official ARM overview for ARMv8[1] is surprisingly not confusing,
given the overall state of affairs for ARM naming conventions, and
offers us a solution. It divides the nomenclature into three levels:
```
ISA: ARMv8 {-A, -R, -M}
/ \
Mode: Aarch32 Aarch64
| / \
Encoding: A64 A32 T32
```
At the top is the overall v8 instruction set archicture. Second are the
two modes, defined by bitwidth but differing in other semantics too, and
buttom are the encodings, (hopefully?) isomorphic if they encode the
same mode.
The 32 bit encodings are mostly backwards compatible with previous
non-Thumb and Thumb encodings, and if so we can pun the mode names to
instead mean "sets of compatable or isomorphic encodings", and then
voilà we have nice names for 32-bit and 64-bit arm instruction sets
which do not use the word ARM so as to not confused either laymen or
experienced ARM packages.
[1]: https://developer.arm.com/products/architecture/a-profile
- `localSystem` is added, it strictly supercedes system
- `crossSystem`'s description mentions `localSystem` (and vice versa).
- No more weird special casing I don't even understand
TEMP
Resolved the following conflicts (by carefully applying patches from the both
branches since the fork point):
pkgs/development/libraries/epoxy/default.nix
pkgs/development/libraries/gtk+/3.x.nix
pkgs/development/python-modules/asgiref/default.nix
pkgs/development/python-modules/daphne/default.nix
pkgs/os-specific/linux/systemd/default.nix
- Add a new parameter `imageType` that can specify either "efi" or
"legacy" (the default which should see no change in behaviour by
this patch).
- EFI images get a GPT partition table (instead of msdos) with a
mandatory ESP partition (so we add an assert that `partitioned`
is true).
- Use the partx tool from util-linux to determine exact start + size
of the root partition. This is required because GPT stores a secondary
partition table at the end of the disk, so we can't just have
mkfs.ext4 create the filesystem until the end of the disk.
- (Unrelated to any EFI changes) Since we're depending on the
`-E offset=X` option to mkfs which is only supported by e2fsprogs,
disallow any attempts of creating partitioned disk images where
the root filesystem is not ext4.
Currently libvirt requires two qemu derivations: qemu and qemu_kvm which is just a truncated version of qemu (defined as qemu.override { hostCpuOnly = true; }).
This patch exposes an option virtualisation.libvirtd.qemuPackage which allows to choose which package to use:
* pkgs.qemu_kvm if all your guests have the same CPU as host, or
* pkgs.qemu which allows to emulate alien architectures (for example ARMV7L on X86_64), or
* a custom derivation
virtualisation.libvirtd.enableKVM option is vague and could be deprecate in favor of virtualisation.libvirtd.qemuPackage, anyway it does allow to enable/disable kvm.
Without this, when you've enabled networkmanager and start a
nixos-container the container will briefly have its specified IP
address but then networkmanager starts managing it causing the IP
address to be dropped.
This is required on the ThunderX CPUs on the Packet.net Type-2A
machines that have a GICv3. For some reason the default is to create a
GICv2 independent of the host hardware...
This is required by the new c5.* instance types.
Note that this changes disk names from /dev/xvd* to
/dev/nvme0n*. Amazon Linux has a udev rule that calls a Python script
named "ec2nvme-nsid" to create compatibility symlinks. We could use
that, but it would mean adding Python to the AMI closure...
Unlike pathsFromGraph, on Nix 1.12, this function produces a
registration file containing correct NAR hash/size information.
https://hydra.nixos.org/build/62832723
-s, --script: never prompts for user intervention
Sometimes the NixOS installer tests fail when they invoke parted, e.g.
https://hydra.nixos.org/build/62513826/nixlog/1. But instead of exiting
right there, the tests hang until the Nix builder times out (and kills
the build). With this change the tests would instead fail immediately,
which is preferred.
While at it, use "parted --script" treewide, so nobody gets build
timeout due to parted error (or misuse). (Only nixos/ use it, and only
non-interactive.)
A few instances already use the short option "-s", convert them to long
option "--short".
Container config example code mentions `postgresql` service, but the correct use of that service involves setting `system.stateVersion` option (as discovered in https://github.com/NixOS/nixpkgs/issues/30056).
The actual system state version is set randomly to 17.03 because I have no preferences here
There are currently two ways to build Openstack image. This just picks
best of both, to keep only one!
- Image is resizable
- Cloudinit is enable
- Password authentication is disable by default
- Use the same layer than other image builders (ec2, gce...)
Although it is quite safe to restart ```libvirtd``` when there are only ```qemu``` machines, in case if there are ```libvirt_lxc``` containers, a restart may result in putting the whole system into an odd state: the containers go on running but the new ```libvirtd``` daemons do not see them.
This allows to run the prune job periodically on a machine.
By default the if enabled the job is run once a week.
The structure is similar to how system.autoUpgrade works.
Use xmlstarlet to update the OVMF path on each startup, like we do for
<emulator>...qemu-kvm</emulator>.
A libvirt domain using UEFI cannot start if the OVMF path is garbage
collected/missing.
Instead of grep and sed, which is brittle.
(I don't know how to preserve the comment we currently add to say that
this line is auto-updated. But I don't think it adds much value, so I'm
not spending any effort on it.)
This commit adds the xen_4_8 package to be used instead of
xen (currently at 4.5.5):
* Add packages xen_4_8, xen_4_8-slim and xen_4_8-light
* Add packages qemu_xen_4_8 and qemu_xen_4_8-light to be used
with xen_4_8-slim and xen_4_8-light respectively.
* Add systemd to buildInputs of xen (it is required by oxenstored)
* Adapt xen service to work with the new version of xen
* Use xen-init-dom0 to initlilise dom0 in xen-store
* Currently, the virtualisation.xen.stored option is ignored
if xen 4.8 is used
OVMF{,CODE,VARS}.fd are now available in a dedicated fd output, greatly
reducing the closure in the common case where only those files are used (a
few MBs versus several hundred MBs for the full OVMF).
Note: it's unclear why `dontPatchELF` is now necessary for the build to
pass (on my end, at any rate) but it doesn't make much sense to run this
fixup anyway,
Note: my reading of xen's INSTALL suggests that --with-system-ovmf should
point directly to the OVMF binary. As such, the previous invocation was
incorrect (it pointed to the root of the OVMF tree). In any case, I have
only built xen with `--with-system-ovmf`, I have not tested it.
Fixes https://github.com/NixOS/nixpkgs/issues/25854
Closes https://github.com/NixOS/nixpkgs/pull/25855
Provide the option forwardDns in virtualisation.xen.bridge, which
enables forwarding of DNS queries to the default resolver, allowing
outside internet access for the xen guests.
The xen-bridge service accepts the option prefixLength, but does not
use it to set the actual netmask on the bridge. This commit makes
it set the correct netmask.
QEMU can allow guests to access more than one host core at a time.
Previously, this had to be done via ad-hoc arguments:
virtualisation.qemu.options = ["-smp 12"];
Now you can simply specify:
virtualisation.cores = 12;
Unfortunately, somewhere between 16.09 and 17.03, paravirtualized
instances stopped working. They hang at the pv-grub prompt
("grubdom>"). I tried reverting to a 4.4 kernel, reverting kernel
compression from xz to bzip2 (even though pv-grub is supposed to
support xz), and reverting the only change to initrd generation
(5a8147479e). Nothing worked so I'm
giving up.
Docker socket is world writable. This means any user on the system is
able to invoke docker command. (Which is equal to having a root access
to the machine.)
This commit makes socket group-writable and owned by docker group.
Inspired by
https://github.com/docker/docker/blob/master/contrib/init/systemd/docker.socket
Having fixed the Google Compute Engine image build process's copying
of store paths in PR #24264, I ran `nixos-rebuild --upgrade switch`...
and the GCE image broke again, because it sets the NixOS configuration
option for the sysctl variable `kernel.yama.ptrace_scope` to
`mkDefault "1"`, i.e., with override priority 1000, and now the
`sysctl` module sets the same option to `mkDefault "0"` (this was
changed in commit 86721a5f78).
This patch raises the override priority of the Google Compute Engine
image configuration's definition of the Yama sysctl option to 500
(still lower than the priority of an unmodified option definition).
I have tested that this patch allows the Google Compute Engine image
to again build successfully for me.
In `nixos/modules/virtualisation/google-compute-image.nix`, copy store
paths with `rsync -a` rather than `cp -prd`, because `rsync` seems
better able to handle the hard-links that may be present in the store,
whereas `cp` may fail to copy them.
I have tested that the Google Compute Engine image builds successfully
for me with this patch, whereas it did not without this patch.
This is the same fix applied for Azure images in commit
097ef6e435.
Fixes #23973.
We now make it happen later in the boot process so that multi-user
has already activated, so as to not run afoul of the logic in
switch-to-configuration.pl. It's not my favorite solution, but at
least it works. Also added a check to the VM test to catch the failure
so we don't break in future.
Fixes #23121
The initialization code is now a systemd service that explicitly
waits for network-online, so the occasional failure I was seeing
because the `nixos-rebuild` couldn't get anything from the binary
cache should stop. I hope!
fix #22709
Recent pvgrub (from Grub built with “--with-platform=xen”) understands
the Grub2 configuration format. Grub legacy configuration (menu.lst) is
ignored.
A very simple skeleton for now that doesn't attempt to model any of
the agent configuration, but we can grow it later. Tested and works
on an EC2 instance with ECS.
All the new options in detail:
Enable docker in multi-user.target make container created with restart=always
to start. We still want socket activation as it decouples dependencies between
the existing of /var/run/docker.sock and the docker daemon. This means that
services can rely on the availability of this socket. Fixes #11478#21303
wantedBy = ["multi-user.target"];
This allows us to remove the postStart hack, as docker reports on its own when
it is ready.
Type=notify
The following will set unset some limits because overhead in kernel's ressource
accounting was observed. Note that these limit only apply to containerd.
Containers will have their own limit set.
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
Upgrades may require schema migrations. This can delay the startup of dockerd.
TimeoutStartSec=0
Allows docker to create its own cgroup subhierarchy to apply ressource limits on
containers.
Delegate=true
When dockerd is killed, container should be not affected to allow
`live restore` to work.
KillMode=process