Currently if you want to properly chroot a systemd service, you could do
it using BindReadOnlyPaths=/nix/store or use a separate derivation which
gathers the runtime closure of the service you want to chroot. The
former is the easier method and there is also a method directly offered
by systemd, called ProtectSystem, which still leaves the whole store
accessible. The latter however is a bit more involved, because you need
to bind-mount each store path of the runtime closure of the service you
want to chroot.
This can be achieved using pkgs.closureInfo and a small derivation that
packs everything into a systemd unit, which later can be added to
systemd.packages.
However, this process is a bit tedious, so the changes here implement
this in a more generic way.
Now if you want to chroot a systemd service, all you need to do is:
{
systemd.services.myservice = {
description = "My Shiny Service";
wantedBy = [ "multi-user.target" ];
confinement.enable = true;
serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice";
};
}
If more than the dependencies for the ExecStart* and ExecStop* (which
btw. also includes script and {pre,post}Start) need to be in the chroot,
it can be specified using the confinement.packages option. By default
(which uses the full-apivfs confinement mode), a user namespace is set
up as well and /proc, /sys and /dev are mounted appropriately.
In addition - and by default - a /bin/sh executable is provided, which
is useful for most programs that use the system() C library call to
execute commands via shell.
Unfortunately, there are a few limitations at the moment. The first
being that DynamicUser doesn't work in conjunction with tmpfs, because
systemd seems to ignore the TemporaryFileSystem option if DynamicUser is
enabled. I started implementing a workaround to do this, but I decided
to not include it as part of this pull request, because it needs a lot
more testing to ensure it's consistent with the behaviour without
DynamicUser.
The second limitation/issue is that RootDirectoryStartOnly doesn't work
right now, because it only affects the RootDirectory option and doesn't
include/exclude the individual bind mounts or the tmpfs.
A quirk we do have right now is that systemd tries to create a /usr
directory within the chroot, which subsequently fails. Fortunately, this
is just an ugly error and not a hard failure.
The changes also come with a changelog entry for NixOS 19.03, which is
why I asked for a vote of the NixOS 19.03 stable maintainers whether to
include it (I admit it's a bit late a few days before official release,
sorry for that):
@samueldr:
Via pull request comment[1]:
+1 for backporting as this only enhances the feature set of nixos,
and does not (at a glance) change existing behaviours.
Via IRC:
new feature: -1, tests +1, we're at zero, self-contained, with no
global effects without actively using it, +1, I think it's good
@lheckemann:
Via pull request comment[2]:
I'm neutral on backporting. On the one hand, as @samueldr says,
this doesn't change any existing functionality. On the other hand,
it's a new feature and we're well past the feature freeze, which
AFAIU is intended so that new, potentially buggy features aren't
introduced in the "stabilisation period". It is a cool feature
though? :)
A few other people on IRC didn't have opposition either against late
inclusion into NixOS 19.03:
@edolstra: "I'm not against it"
@Infinisil: "+1 from me as well"
@grahamc: "IMO its up to the RMs"
So that makes +1 from @samueldr, 0 from @lheckemann, 0 from @edolstra
and +1 from @Infinisil (even though he's not a release manager) and no
opposition from anyone, which is the reason why I'm merging this right
now.
I also would like to thank @Infinisil, @edolstra and @danbst for their
reviews.
[1]: https://github.com/NixOS/nixpkgs/pull/57519#issuecomment-477322127
[2]: https://github.com/NixOS/nixpkgs/pull/57519#issuecomment-477548395
Currently, if you want to properly chroot a systemd service, you could
do it using BindReadOnlyPaths=/nix/store (which is not what I'd call
"properly", because the whole store is still accessible) or use a
separate derivation that gathers the runtime closure of the service you
want to chroot. The former is the easier method and there is also a
method directly offered by systemd, called ProtectSystem, which still
leaves the whole store accessible. The latter however is a bit more
involved, because you need to bind-mount each store path of the runtime
closure of the service you want to chroot.
This can be achieved using pkgs.closureInfo and a small derivation that
packs everything into a systemd unit, which later can be added to
systemd.packages. That's also what I did several times[1][2] in the
past.
However, this process got a bit tedious, so I decided that it would be
generally useful for NixOS, so this very implementation was born.
Now if you want to chroot a systemd service, all you need to do is:
{
systemd.services.yourservice = {
description = "My Shiny Service";
wantedBy = [ "multi-user.target" ];
chroot.enable = true;
serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice";
};
}
If more than the dependencies for the ExecStart* and ExecStop* (which
btw. also includes "script" and {pre,post}Start) need to be in the
chroot, it can be specified using the chroot.packages option. By
default (which uses the "full-apivfs"[3] confinement mode), a user
namespace is set up as well and /proc, /sys and /dev are mounted
appropriately.
In addition - and by default - a /bin/sh executable is provided as well,
which is useful for most programs that use the system() C library call
to execute commands via shell. The shell providing /bin/sh is dash
instead of the default in NixOS (which is bash), because it's way more
lightweight and after all we're chrooting because we want to lower the
attack surface and it should be only used for "/bin/sh -c something".
Prior to submitting this here, I did a first implementation of this
outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality
from systemd-lib.nix, just because it's only a single line.
However, I decided to just re-use the one from systemd here and
subsequently made it available when importing systemd-lib.nix, so that
the systemd-chroot implementation also benefits from fixes to that
functionality (which is now a proper function).
Unfortunately, we do have a few limitations as well. The first being
that DynamicUser doesn't work in conjunction with tmpfs, because it
already sets up a tmpfs in a different path and simply ignores the one
we define. We could probably solve this by detecting it and try to
bind-mount our paths to that different path whenever DynamicUser is
enabled.
The second limitation/issue is that RootDirectoryStartOnly doesn't work
right now, because it only affects the RootDirectory option and not the
individual bind mounts or our tmpfs. It would be helpful if systemd
would have a way to disable specific bind mounts as well or at least
have some way to ignore failures for the bind mounts/tmpfs setup.
Another quirk we do have right now is that systemd tries to create a
/usr directory within the chroot, which subsequently fails. Fortunately,
this is just an ugly error and not a hard failure.
[1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62
[2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124
[3]: The reason this is called "full-apivfs" instead of just "full" is
to make room for a *real* "full" confinement mode, which is more
restrictive even.
[4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix
Signed-off-by: aszlig <aszlig@nix.build>
This should make the composability of kernel configurations more straigthforward.
- now distinguish freeform options from tristate ones
- will look for a structured config in kernelPatches too
one can now access the structuredConfig from a kernel via linux_test.configfile.structuredConfig
in order to reinject it into another kernel, no need to rewrite the config from scratch
The following merge strategies are used in case of conflict:
-- freeform items must be equal or they conflict (mergeEqualOption)
-- for tristate (y/m/n) entries, I use the mergeAnswer strategy which takes the best available value, "best" being defined by the user (by default "y" > "m" > "n", e.g. if one entry is both marked "y" and "n", "y" wins)
-- if one item is both marked optional/mandatory, mandatory wins (mergeFalseByDefault)
Right now it's not at all obvious that one can override this option
using `services.logind.extraConfig`; we might as well add an option
for `killUserProcesses` directly so it's clear and documented.
The new reuse behaviour is cool and really useful but it breaks one of
my use cases. When using kexec, I have a script which will unlock the
disks in my initrd. However, do_open_passphrase will fail if the disk is
already unlocked.
The previous version contained a false positive case, where boot would
continue when the stage 2 init did not exist at all, and a false
negative case, where boot would stop if the stage 2 init was a symlink
which cannot be resolved in the initramfs root.
Fixes#49519.
Thanks @michas2 for finding and reporting the issue!
* journald: forward message to syslog by default if a syslog implementation is installed
* added a test to ensure rsyslog is receiving messages when expected
* added rsyslogd tests to release.nix
* Lets container@.service be activated by machines.target instead of
multi-user.target
According to the systemd manpages, all containers that are registered
by machinectl, should be inside machines.target for easy stopping
and starting container units altogether
* make sure container@.service and container.slice instances are
actually located in machine.slice
https://plus.google.com/112206451048767236518/posts/SYAueyXHeEX
See original commit: https://github.com/NixOS/systemd/commit/45d383a3b8
* Enable Cgroup delegation for nixos-containers
Delegate=yes should be set for container scopes where a systemd instance
inside the container shall manage the hierarchies below its own cgroup
and have access to all controllers.
This is equivalent to enabling all accounting options on the systemd
process inside the system container. This means that systemd inside
the container is responsible for managing Cgroup resources for
unit files that enable accounting options inside. Without this
option, units that make use of cgroup features within system
containers might misbehave
See original commit: https://github.com/NixOS/systemd/commit/a931ad47a8
from the manpage:
Turns on delegation of further resource control partitioning to
processes of the unit. Units where this is enabled may create and
manage their own private subhierarchy of control groups below the
control group of the unit itself. For unprivileged services (i.e.
those using the User= setting) the unit's control group will be made
accessible to the relevant user. When enabled the service manager
will refrain from manipulating control groups or moving processes
below the unit's control group, so that a clear concept of ownership
is established: the control group tree above the unit's control
group (i.e. towards the root control group) is owned and managed by
the service manager of the host, while the control group tree below
the unit's control group is owned and managed by the unit itself.
Takes either a boolean argument or a list of control group
controller names. If true, delegation is turned on, and all
supported controllers are enabled for the unit, making them
available to the unit's processes for management. If false,
delegation is turned off entirely (and no additional controllers are
enabled). If set to a list of controllers, delegation is turned on,
and the specified controllers are enabled for the unit. Note that
additional controllers than the ones specified might be made
available as well, depending on configuration of the containing
slice unit or other units contained in it. Note that assigning the
empty string will enable delegation, but reset the list of
controllers, all assignments prior to this will have no effect.
Defaults to false.
Note that controller delegation to less privileged code is only safe
on the unified control group hierarchy. Accordingly, access to the
specified controllers will not be granted to unprivileged services
on the legacy hierarchy, even when requested.
The following controller names may be specified: cpu, cpuacct, io,
blkio, memory, devices, pids. Not all of these controllers are
available on all kernels however, and some are specific to the
unified hierarchy while others are specific to the legacy hierarchy.
Also note that the kernel might support further controllers, which
aren't covered here yet as delegation is either not supported at all
for them or not defined cleanly.
"machine.target" doesn't actually exist, it's misspelled version
of "machines.target". However, the "systemd-nspawn@.service"
unit already has a default dependency on "machines.target"
The improved lspci command shows all available ethernet controllers and
their kernel modules. Previously, the user had to provide the slot name
of a specific device.
The default value for journald's Storage option is "auto", which
determines whether to log to /var/log/journal based on whether that
directory already exists. So NixOS has been unconditionally creating
that directory in activation scripts.
However, we can get the same behavior by configuring journald.conf to
set Storage to "persistent" instead. In that case, journald will create
the directory itself if necessary.
Previously, the activation script was responsible for ensuring that
/etc/machine-id exists. However, the only time it could not already
exist is during stage-2-init, not while switching configurations,
because one of the first things systemd does when starting up as PID 1
is to create this file. So I've moved the initialization to
stage-2-init.
Furthermore, since systemd will do the equivalent of
systemd-machine-id-setup if /etc/machine-id doesn't have valid contents,
we don't need to do that ourselves.
We _do_, however, want to ensure that the file at least exists, because
systemd also uses the non-existence of this file to guess that this is a
first-boot situation. In that case, systemd tries to create some
symlinks in /etc/systemd/system according to its presets, which it can't
do because we've already populated /etc according to the current NixOS
configuration.
This is not necessary for any other activation script snippets, so it's
okay to do it after stage-2-init runs the activation script. None of
them declare a dependency on the "systemd" snippet. Also, most of them
only create files or directories in ways that obviously don't need the
machine-id set.
Evaluation error introduced in 599c4df46a.
There is only a "platformS" attribute in kexectools.meta, so let's use
this and from the code in the kexec module it operates on a list,
matching the corresponding platforms, so this seems to be the attribute
the original author intended.
Tested by building nixos/tests/kexec.nix on x86_64-linux and while it
evaluates now, the test still fails by timing out shortly after the
kexec:
machine: waiting for the VM to finish booting
machine# Cannot find the ESP partition mount point.
This however seems to be an unrelated issue and was also the case before
the commit mentioned above.
Signed-off-by: aszlig <aszlig@nix.build>
Cc: @edolstra, @dezgeg
* acquire DHCP on the interfaces with networking.interface.$name.useDHCP == true or on all interfaces if networking.useDHCP == true (was only only "eth0")
* respect "mtu" if it was in DHCP answer (it happens in the wild)
* acquire and set up staticroutes (unlike others clients, udhcpc does not do the query by default); this supersedes https://github.com/NixOS/nixpkgs/pull/41829
Although double '/' in paths is not a problem for GRUB supplied with nixpkgs, sometimes NixOS's grub.conf read by external GRUB and there are versions of GRUB which fail
The background color option is self-explanatory.
The mode is either `normal` or `stretch`, they are as defined by GRUB,
where normal will put the image in the top-left corner of the menu, and
stretch is the default, where it stretches the image without
consideration for the aspect ratio.
* https://www.gnu.org/software/grub/manual/grub/grub.html#background_005fimage
This fixes an issue where setting both
`boot.loader.systemd-boot.editor` to `false` and
`boot.loader.systemd-boot.consoleMode` to any value would concatenate
the two configuration lines in the output, resulting in an invalid
`loader.conf`.
From reading the source I'm pretty sure it doesn't support multiple Yubikeys, hence
those options are useless.
Also, I'm pretty sure nobody actually uses this feature, because enabling it causes
extra utils' checks to fail (even before applying any patches of this branch).
As I don't have the hardware to test this, I'm too lazy to fix the utils, but
I did test that with extra utils checks commented out and Yubikey
enabled the resulting script still passes the syntax check.
Also reuse common cryptsetup invocation subexpressions.
- Passphrase reading is done via the shell now, not by cryptsetup.
This way the same passphrase can be reused between cryptsetup
invocations, which this module now tries to do by default (can be
disabled).
- Number of retries is now infinity, it makes no sense to make users
reboot when they fail to type in their passphrase.
Rather than special-casing the dns options in networkmanager.nix, use
the module system to let unbound and systemd-resolved contribute to
the newtorkmanager config.
find-libs is currently choking when it finds the dynamic linker
as a DT_NEEDED dependency (from glibc) and bails out like this
(as glibc doesn't have a RPATH):
Couldn't satisfy dependency ld-linux-x86-64.so.2
Actually the caller of find-libs ignores the exit status, so the issue
almost always goes unnoticed and happens to work by chance. But
additionally what happens is that indirect .so dependencies are
left out from the dependency closure calculation, which breaks
latest cryptsetup as libssl.so isn't found anymore.
F2FS is used on Raspberry Pi-like devices to enhance SD card performance. Allowing F2FS resizing would help in automatic deploying of SD card images without a Linux box to resize the file system offline.
This has been reported by @qknight in his Stack Overflow question:
https://stackoverflow.com/q/50678639
The correct way to override a single value would be to use something
like this:
systemd.services.nagios.serviceConfig.Restart = lib.mkForce "no";
However, this doesn't work because the check is applied for the attrsOf
type and thus the attribute values might still contain the attribute set
created by mkOverride.
The unitOption type however did already account for this, but at this
stage it's already too late.
So now the actual value is unpacked while checking the values of the
attribute set, which should allow us to override values in
serviceConfig.
Signed-off-by: aszlig <aszlig@nix.build>
Cc: @edolstra, @qknight
When a package contains a directory in one of the systemd directories
(like flatpak does), it is symlinked into the *-units derivation.
Then later, the derivation will try to create the directory, which
will fail:
mkdir: cannot create directory '/nix/store/…-user-units/dbus.service.d': File exists
builder for '/nix/store/…-user-units.drv' failed with exit code 1
Closes: #33233
GRUB 2.0 supports png, jpeg and tga. This will use the image's suffix to
load the right module.
As jpeg module is named jpeg, jpg is renamed jpeg.
If the user uses wrong image suffix for an image, it wouldn't work anyway.
This will leave up to two additional left-over files in /boot/ if user switches
through all the supported file formats. The module already left the png
image if the user disabled the splash image.
This is apparent from the service file directory in plymouth:
├── multi-user.target.wants
│ ├── plymouth-quit.service -> ../plymouth-quit.service
│ └── plymouth-quit-wait.service -> ../plymouth-quit-wait.service
Leaving it unspecified caused gdm-wayland to crash on boot, see #39615.
The change made other display managers not quit plymouth properly however. By
removing "multi-user.target" from `plymouth-quit.after` this is resolved.
The isKexecable flag treated Linux without kexec as just a normal
variant, when it really should be treated as a special case incurring
complexity debt to support.
Resolved the following conflicts (by carefully applying patches from the both
branches since the fork point):
pkgs/development/libraries/epoxy/default.nix
pkgs/development/libraries/gtk+/3.x.nix
pkgs/development/python-modules/asgiref/default.nix
pkgs/development/python-modules/daphne/default.nix
pkgs/os-specific/linux/systemd/default.nix
This can be used to fix issues where udhcpc times out before
acquiring a lease. For example of these issues, see:
https://bugs.alpinelinux.org/issues/3105#note-8
Signed-off-by: Dino A. Dai Zovi <ddz@theta44.org>
At one point in my configuration I had:
boot.kernel.sysctl = {
# https://unix.stackexchange.com/questions/13019/description-of-kernel-printk-values
"kernel.printk" = "4 4 1 7";
};
which triggered:
error: The unique option `boot.kernel.sysctl.kernel.printk' is defined multiple times, in `/home/teto/dotfiles/nixpkgs/mptcp-unstable.nix' and `/home/teto/nixpkgs/nixos/modules/system/boot/kernel.nix'.
(use ‘--show-trace’ to show detailed location information)
Traceback (most recent call last):
File "/home/teto/nixops/scripts/nixops", line 984, in <module>
args.op()
File "/home/teto/nixops/scripts/nixops", line 406, in op_deploy
max_concurrent_activate=args.max_concurrent_activate)
File "/home/teto/nixops/nixops/deployment.py", line 1045, in deploy
self.run_with_notify('deploy', lambda: self._deploy(**kwargs))
File "/home/teto/nixops/nixops/deployment.py", line 1034, in run_with_notify
f()
File "/home/teto/nixops/nixops/deployment.py", line 1045, in <lambda>
self.run_with_notify('deploy', lambda: self._deploy(**kwargs))
File "/home/teto/nixops/nixops/deployment.py", line 985, in _deploy
self.configs_path = self.build_configs(dry_run=dry_run, repair=repair, include=include, exclude=exclude)
File "/home/teto/nixops/nixops/deployment.py", line 653, in build_configs
raise Exception("unable to build all machine configurations")
Exception: unable to build all machine configurations
This simple addition allows to override it.
This option, if set to true, enables fallbacking to an interactive
passphrase prompt when the specified keyFile is not found.
The default is false, which is compatible with previous behavior and
doesn't prevent unattended boot.
Regression introduced by 801c920e95.
Since then, the btrfsSimple subtest of the installer VM test fails with:
Btrfs did not return a path for the subvolume at /
The reason for this is that the output for "btrfs subvol show" has
changed between version 4.8.2 and 4.13.1.
For example the output of "btrfs subvol show /" in version 4.8.2 was:
/ is toplevel subvolume
In version 4.13.1, the output now is the following and thus the regular
expressions used in nixos-generate-config.pl and install-grub.pl now
match (which results in the error mentioned above):
/
Name: <FS_TREE>
UUID: -
Parent UUID: -
Received UUID: -
Creation time: -
Subvolume ID: 5
Generation: 287270
Gen at creation: 0
Parent ID: 0
Top level ID: 0
Flags: -
Snapshot(s):
In order to fix this I've changed nixos-generate-config.pl and
install-grub.pl, because both use "btrfs subvol show" in a similar vein,
so the regex for parsing the output now doesn't match anymore whenever
the volume path is "/", which should result in the same behaviour as we
had with btrfs-progs version 4.8.2.
Tested against the btrfsSimple, btrfsSubvols and btrfsSubvolDefault
subtests of the installer VM test and they all succeed now.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
* the keyboard modules in all-hardware.nix are already defaults of
boot.initrd.availableKernelModules
* ide modules, hid_lenovo_tpkbd and scsi_wait_scan have been removed
because they're not available anymore
* i8042 was a duplicate (see few lines abowe)
the systemd.unit(5) discussion of wantedBy and requiredBy is in the
[Install] section, and thus focused on stateful 'systemctl enable'.
so, clarify that in NixOS, wantedBy & requiredBy are still what most
users want, and not to be confused with enabled.
Boot fails when a keyfile is configured for all encrypted filesystems
and no other luks devices are configured. This is because luks support is only
enabled in the initrd, when boot.initrd.luks.devices has entries. When a
fileystem has a keyfile configured though, it is setup by a custom
command, not by boot.initrd.luks.
This commit adds an internal config flag to enable luks support in the
initrd file, even if there are no luks devices configured.
Grub configs include the NixOS version and date they were built, now
systemd can have fun too:
version Generation 99 NixOS 17.03.1700.51a83266d1, Linux Kernel 4.9.43, Built on 2017-08-30
version Generation 100 NixOS 17.03.1700.51a83266d1, Linux Kernel 4.9.43, Built on 2017-08-30
version Generation 101 NixOS 17.03.1700.51a83266d1, Linux Kernel 4.9.43, Built on 2017-08-31
version Generation 102 NixOS 17.03.1700.51a83266d1, Linux Kernel 4.9.43, Built on 2017-09-01
version Generation 103 NixOS 17.03.1700.51a83266d1, Linux Kernel 4.9.43, Built on 2017-09-02
version Generation 104 NixOS 17.09beta41.1b8c7786ee, Linux Kernel 4.9.46, Built on 2017-09-02
version Generation 105 NixOS 17.09.git.1b8c778, Linux Kernel 4.9.46, Built on 2017-09-02
The message buffer of the kernel lists
> Please run 'e2fsck -f /dev/disk/by-label/nixos' first.
as the output of the command `resize2fs "$device"`.
This fixesNixOS/nixpkgs#26910.
old version of blkid used to output version information including libblkid version
when invoked with --help parameter
new version does not output libblkid version when invoked with --help parameter
fix is to invoke blkid with -V parameter to output version including libblkid in both cases
* systemd-boot-builder.py: add support for profiles
This will also list the generations of other profiles than `system` in
the boot menu. See the documentation of the `--profile-name` option of
nixos-rebuild for more information on profiles.
* Fix errors introduced by previous commit
Or else `services.udev.packages = [ bcache-tools ]` cannot be used.
To not break bcache in the initrd I'm modifying this in stage-1.nix:
- --replace /bin/sh ${extraUtils}/bin/sh
+ --replace ${bash}/bin/sh ${extraUtils}/bin/sh
Reasoning behind that change:
* If not modifying the /bin/sh pattern in any way, it will also match
${bash}/bin/sh, creating a broken path like
/nix/store/HASH-bash/nix/store/HASH-bash/bin/sh in the udev rule file.
* The addition of /bin/sh was done in 775f381a9e
("stage-1: add bcache support"). It seems somewhat plausible that
no new users have appeared since then and we can take this opportunity
to back out of this change without much fear of regressions.
If there _are_ regressions, they should be in the form of build time
errors, not runtime (boot), due to how the udev rule output is checked
for invalid path references. So low risk, IMHO.
* An alternative approach could be to copy the /bin/sh substitute rule
over to the non-initrd udev rules implementation in NixOS, but I think
this way is better:
- The rules file comes with a working path out of the box.
- We can use more precise pattern matching when modifying the udev
rules for the initrd.
The default font is unreadably small on some hidpi displays. This
makes it possible to specify a TrueType or OpenType font at any point
size, and it will automatically be converted to the format the Grub
uses.