Since fat32 provides little recovery facilities after a crash,
it can leave the system in an unbootable state, when a crash/outage
happens shortly after an update. To decrease the likelihood of this
event sync the efi filesystem after each update.
Someone on IRC wanted to boot Fedora from another disk. While I'm not
too familiar with UEFI booting in conjunction with GRUB2 it took some
time to get it to work.
So in order to safe others from frustration I'm adding this as another
example to the extraEntries option.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
This has surfaced since d990aa7163.
The "simpleUefiGummiboot" installer test fails since this commit,
because that commit introduced a small check to verify whether the store
was altered.
While installing NixOS for the first time, the store is usually in
/mnt/nix/store and without the read-only bind mount that's preventing
programs from altering the store.
So after nixos-install is done creating the system closure and setting
it as the active system profile, the bootloader is written from the
closure inside the chroot. The systemd-boot-builder is invoked during
this step, which adds .pyc files for various Python modules of the
Python 3 store path, which in turn invalidates the hash of the Python 3
store path itself.
At the time the system is booted up again, the nix-store is verified and
fails with something like this:
path /nix/store/zvm545rqc4d97caqq9h7344bnd06jhzb-python3-3.5.3 was
modified! expected hash
b2c975f4b8d197443fbb09690fb3f6545e165dd44c9309d7d6df2fce0579ebeb, got
bccca19f39c9d26d857ccf1fb72818b2b817967e6d497a25a1283e36ed0acf01
Running the interpreter with the -B argument prevents Python from
writing those byte code files:
https://docs.python.org/3/using/cmdline.html#cmdoption-B
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
This reverts commit c2b56626f1.
It broke creating the manual. I suspect the descriptions are
auto-wrapped by <para> and </para>.
We've been through this already in 3af715af90.
/cc #24978, @zraexy, @Mic92.
The LUKS header can be on another device (e.g. a USB stick). In my case
it can take up to two seconds until the partition on my USB stick is
available (i.e. the decryption fails without this patch). This will also
remove some redundancy by providing the shell function `wait_target` and
slightly improve the output (one "." per second and a success/failure
indication after 10 seconds instead of always printing "ok").
Restarting them is useless since the filesystem is already
checked. Worse, restarting them causes the filesystem to be unmounted.
Also remove an override for systemd-rkill@.service which no longer
exists.
When dhcpcd instead of networkd is used, the network-online.target behaved
the same as network.target, resulting in broken services that need a working
network connectivity when being started.
This commit makes dhcpcd wait for a lease and makes it wanted by
network-online.target. In turn, network-online.target is now wanted by
multi-user.target, so it will be activated at every boot.
Using toJSON on a string value works because the allowed JSON escape
sequences is almost a subset of the systemd allowed escape sequences.
The only exception is `\/` which JSON allows but systemd doesn't.
Luckily this sequence isn't required and toJSON don't produce it making
the result valid for systemd consumption.
Examples of things that this fixes are environment variables with double
quotes or newlines.
Since systemd version 232 the install subcommand of bootctl opens the
loader.conf with fopen() modes "wxe", where the "e" stands for
exclusive, so the call will fail if the file exists.
For installing the boot loader just once this is fine, but if we're
using NIXOS_INSTALL_BOOTLOADER on a systemd where the bootloader is
already present this will fail.
Exactly this is done within the simpleUefiGummiboot installer test,
where nixos-install is called twice and thus the bootloader is also
installed twice, resulting in an error during the fopen call:
Failed to open loader.conf for writing: File exists
Removing the file prior to calling bootctl should fix this.
I've tested this using the installer.simpleUefiGummiboot test and it now
succeeds.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Cc: @edolstra, @shlevy, @mic92
Fixes: #22925
This leads to the following error when trying to install a new machine
where the machine ID wasn't yet initialized during boot:
Failed to get machine did: No such file or directory
In addition this was also detected by the simpleUefiGummiboot installer
test.
So let's generate a fallback machine ID by using
systemd-machine-id-setup before actually running bootctl.
Tested this by running the installer.simpleUefiGummiboot test, it still
fails but not because of the machine ID.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Cc: @edolstra, @shlevy, @mic92
Fixes: #22561
Since the bonds interface changed to a lot more possible values we create a
mapping of kernel bond attribute names and values to networkd attributes.
Those match for the most part, but have to transformed slightly.
There is also an assert that unknown options won’t slip through silently.
The Raspberry Pi boot loader was deleting all xx-initrd text files
(which simply contain the path to the actual initrd files) just after
having created them. The code was actually trying to delete real,
obsolete initrd files, which are named <hash>-initrd-initrd (after path
cleaning), but the glob was catching the other files as well.
This reverts commit 712e62c260.
This commit broke NixOS containers. Systemd wouldn't detect if a container
started successfully and would kill it again after a grace period.
Additionally this prints mount errors due to already mounted filesystems
at boot.
This code in amazon-image.nix:
if mountFS "$device" "$mp" "" auto; then
if [ -z "$diskForUnionfs" ]; then diskForUnionfs="$mp"; fi
fi
relies on mountFS to return a zero exit status if mounting
succeeds. But the lustrateRoot check in mountFS was causing a non-zero
exit status. As a result /disk0 would be mounted, but not used for
/tmp.
(cherry picked from commit d082ed8c35dec48aee2afd1303b3c8b2a1b242b0)
networkd options are always correct or up to date. This option allows to by
pass type checking. It is also easier to write because examples can be just copy
and paste from manpages.
Networkd units can contain secrets. In future also wireguard vpn will be supported by
networkd. To avoid leakage of private keys, those could be then also put outside
of the /nix/store
Having a writeable /etc/systemd/network also allows to quick fix network issues,
when upgrading `nixos-rebuild switch` would require network on its own (due
updates).
This reverts commit 656cc3acaf because it
causes building the manual to fail:
$ nixos-rebuild build
...
building path(s) ‘/nix/store/s9y5z78z5pssvmixcmv9ix13gs8xj87f-manual-olinkdb’
Writing /nix/store/s9y5z78z5pssvmixcmv9ix13gs8xj87f-manual-olinkdb/manual.db for book(book-nixos-manual)
./man-pages.xml:625: element para: Relax-NG validity error : Did not expect element para there
./man-pages.xml:3: element variablelist: Relax-NG validity error : Element refsection has extra content: variablelist
./man-pages.xml:29: element refsection: Relax-NG validity error : Element refentry has extra content: refsection
./man-pages.xml:3: element reference: Relax-NG validity error : Element reference failed to validate content
./man-pages.xml fails to validate
CC @cleverca22, @Mic92
- most nixos user only require time synchronisation,
while ntpd implements a battery-included ntp server (1,215 LOCs of C-Code vs 64,302)
- timesyncd support ntp server per interface (if configured through dhcp for instance)
- timesyncd is already included in the systemd package, switching to it would
save a little disk space (1,5M)
Using the --force option on GRUB isn't recommended, but there are very
specific instances where it makes sense. One example is installing on a
partitionless disk.
Thanks to @NeQuissimus in a5c1985fef for
updating busybox, which since version 1.25 doesn't allow local variables
outside of functions anymore (which is the desired behaviour).
See the following upstream commit of busybox which is the change that
let's this problem surface:
https://git.busybox.net/busybox/commit/?id=ef2386b80abfb22ccb697ddbdd4047aacc395c50
So this has been an error I've made on my end in
67223ee205, because I originally had a
function for killing the processes but desired to inline it because it's
only used in one place.
This fixes the boot-stage1 NixOS test.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
For some reason, between Linux 4.4.19 and 4.4.20, the atkbd and libps2
kernel modules lost their dependency on i8042 in modules.dep, causing
i8042 not to be included in the initrd. This breaks keyboard in the
initrd, in turn breaking LUKS.
This only happens on the 16.03 branch; on 16.09, it appears i8042 is
pulled into the initrd anyway (through some other dependency,
presumably). But let's include it explicitly.
http://hydra.nixos.org/build/40468431
This makes it easy to specify kernel patches:
boot.kernelPatches = [ pkgs.kernelPatches.ubuntu_fan_4_4 ];
To make the `boot.kernelPatches` option possible, this also makes it
easy to extend and/or modify the kernel packages within a linuxPackages
set. For example:
pkgs.linuxPackages.extend (self: super: {
kernel = super.kernel.override {
kernelPatches = super.kernel.kernelPatches ++ [
pkgs.kernelPatches.ubuntu_fan_4_4
];
};
});
Closes #15095
This ensures that most "trivial" derivations used to build NixOS
configurations no longer depend on GCC. For commands that do invoke
gcc, there is runCommandCC.
This allows us to define system user targets in addition to the existing
services, timers and sockets.
Furthermore, we add a top-level configuration keyword:
- Documentation
When Grub is to be used with UEFI, it is not going to write to any MBR
of any disk. As such, it is safe to use multiple "nodev" device entries
when mirroring the ESP partition to multiple disks.
E.g.:
```
boot.loader.grub = {
enable = true;
version = 2;
zfsSupport = true;
efiSupport = true;
mirroredBoots = [
{ devices = [ "nodev" ]; path = "/boot1"; efiSysMountPoint = "/boot1"; }
{ devices = [ "nodev" ]; path = "/boot2"; efiSysMountPoint = "/boot2"; }
{ devices = [ "nodev" ]; path = "/boot3"; efiSysMountPoint = "/boot3"; }
];
};
boot.loader.efi.canTouchEfiVariables = true;
```
Fixes #18584
All swap device option sets "have" a label, it's just that sometimes it's
undefined. Because we set a `device` attribute when we have a label anyway it's
ok to just check device prefix.
Fixes #18891.
See #18319 for details. Starting network-online.target manually does not
work as it hangs indefinitely.
Additionally, don't treat avahi and dhcpcd special and sync their systemd units
with the respective upstream suggestion.
Systemd upstream provides targets for networking. This also includes a target network-online.target.
In this PR I remove / replace most occurrences since some of them were even wrong and could delay startup.
This partially reverts commit ab9537ca22.
From the manpage of systemd-nspawn(1):
Note that systemd-nspawn will mount file systems private to the
container to /dev, /run and similar.
Testing this in a shell turns out:
$ sudo systemd-nspawn --bind-ro=/nix/store "$(readlink "$(which ls)")" /proc
Spawning container aszlig on /home/aszlig.
Press ^] three times within 1s to kill container.
/etc/localtime does not point into /usr/share/zoneinfo/, not updating
container timezone.
1 execdomains kpageflags stat
acpi fb loadavg swaps
asound filesystems locks sys
buddyinfo fs meminfo sysrq-trigger
bus interrupts misc sysvipc
cgroups iomem modules thread-self
cmdline ioports mounts timer_list
config.gz irq mtrr timer_stats
consoles kallsyms net tty
cpuinfo kcore pagetypeinfo uptime
crypto key-users partitions version
devices keys scsi vmallocinfo
diskstats kmsg self vmstat
dma kpagecgroup slabinfo zoneinfo
driver kpagecount softirqs
Container aszlig exited successfully.
So the test on whether PID 1 exists in /proc is enough, because if we
use PID namespaces there actually _is_ a PID 1 (as shown above) and the
special file systems are already mounted. A test on the $containers
variable actually mounts them twice.
This unbreaks NixOS containers and I've tested this against the
containers-imperative NixOS test.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Cc: @rickynils, @shlevy, @edolstra
Before this commit updating /var/setuid-wrappers/ folder introduced
a small window where NixOS activation scripts could be terminated
and resulted into empty /var/setuid-wrappers/ folder.
That's very unfortunate because one might lose sudo binary.
Instead we use two atomic operations mv and ln (as described in
https://axialcorps.com/2013/07/03/atomically-replacing-files-and-directories/)
to achieve atomicity.
Since /var/setuid-wrappers is not a directory anymore, tmpfs mountpoints
were removed in installation scripts and in boot process.
Tested:
- upgrade /var/setuid-wrappers/ from folder to a symlink
- make sure /run/setuid-wrappers-dirs/ legacy symlink is really deleted
Both btrfs-progs and utillinux are ~5MB, we may discuss in future
to handle this better but I see no better way at the moment than
increaing purity in the install process.
OnCalendar entrys can be specified multiple times in a systemd timer, to
make more complex scheduling possible.
Tested by manually checking the timer generated by the following:
systemd = {
services.huhu = {
description = "meh";
wantedBy = [ "default.target" ];
serviceConfig.ExecStart = "/bin/sh -c 'printf HUHU!'";
startAt = [ "*:*:0/30" "*:0/1:15" ];
};
};
It prints HUHU to the log at seconds 0, 15 and 30 of each minute.
A new internal config option `fileSystems.<name>.early` is added to indicate
that the filesystem needs to be loaded very early (i.e. in initrd). They are
transformed to a shell script in `system.build.earlyMountScript` with calls to
an undefined `specialMount` function, which is expected to be caller-specific.
This option is used by stage-1, stage-2 and activation script to set up and
remount those filesystems. Options for them are updated according to systemd
defaults.
lustrate /ˈlʌstreɪt/ verb.
purify by expiatory sacrifice, ceremonial washing, or some other
ritual action.
- sudo touch /etc/NIXOS_LUSTRATE
⇒ on next reboot, during stage 1, everything but /nix and /boot
is moved to /old-root
- echo "etc/passwd" | sudo tee -a /etc/NIXOS_LUSTRATE
⇒ on next reboot, during stage 1, everything but /nix and /boot
is moved to /old-root; except /etc/passwd is copied back.
Useful for installing NixOS in place on another distro. For instance:
$ nix-env -iE '_: with import <nixpkgs/nixos> { configuration = {}; }; with config.system.build; [ nixos-generate-config manual.manpages ]'
$ sudo mkdir /etc/nixos
$ sudo `which nixos-generate-config`
… edit the configuration files in /etc/nixos using man configuration.nix
if needed
maybe add: users.extraUsers.root.initialHashedPassword = "" ?
… Build the entire NixOS system and link it to the system profile:
$ nix-env -p /nix/var/nix/profiles/system -f '<nixpkgs/nixos>' -A system --set
… If you were using a single user install:
$ sudo chown -R 0.0 /nix
… NixOS is about to take over
$ sudo touch /etc/NIXOS
$ sudo touch /etc/NIXOS_LUSTRATE
… Let's keep the configuration files we just created
$ echo etc/nixos | sudo tee -a /etc/NIXOS_LUSTRATE
$ sudo mv -v /boot /boot.bak &&
sudo /nix/var/nix/profiles/system/bin/switch-to-configuration boot
$ sudo reboot
… NixOS boots, Stage 1 moves all the old distro stuff in /old-root.
The builder has this convoluted `while` loop which just replicates
`readlink -e`. I'm sure there was a reason at one point, because the
loop has been there since time immemorial. It kept getting copied
around, I suspect because nobody bothered to understand what it actually
did.
Incidentally, this fixes #17513, but I have no idea why.
We currently only allow upstream's default of "reboot.target" due to the
way the symlinks are initialized. I made this configurable similar to the
default unit.
This reverts commit c69c76ca7e.
This patch was messed up during a rebase -- the commit title doesn't match what
it really does at all (it is actually a broken attempt to get LUKS passphrase
prompts in Plymouth).
A disabled systemd service with a "startAt" attribute, like this:
systemd.services.foo-service = {
enable = false;
startAt = "*-*-* 05:15:00";
...
};
will cause the following errors in the system journal:
systemd[1]: foo-service.timer: Refusing to start, unit to trigger not loaded.
systemd[1]: Failed to start foo-service.timer.
Fix it by not generating the corresponding timer unit when the service
is disabled.
Previously, the value from stdenv.platform.kernelDTB was used. That
doesn't work well if both kinds (DTB and non-DTB) of generations exist
in the system profile.
':' is currently used as separator in /boot/grub/state for the list of
devices GRUB should be installed to. The problem is that ':' itself may
appear in a device path:
/dev/disk/by-id/usb-SanDisk_Cruzer_20043512300546C0B317-0:0
With such a path, NixOS will install GRUB *every* time, because it
thinks the configuration differs from the state file (due to the wrong
list split). Fix it by using ',' as separator.
For existing systems with GRUB installed on multiple devices, this
change means that GRUB will be installed one extra time.
When displaying a warning about a removed Option we should always
include reasoning why it was removed and how to get the same
functionality without it.
Introduces such a description argument and patches occurences (mostly
with an empty string).
startGnuPGAgent: further notes on replacement
This allows setting options for the same LUKS device in different
modules. For example, the auto-generated hardware-configuration.nix
can contain
boot.initrd.luks.devices.crypted.device = "/dev/disk/...";
while configuration.nix can add
boot.initrd.luks.devices.crypted.allowDiscards = true;
Also updated the examples/docs to use /disk/disk/by-uuid instead of
/dev/sda, since we shouldn't promote the use of the latter.
Unfortunately, pkill doesn't distinguish between kernel and user space
processes, so we need to make sure we don't accidentally kill kernel
threads.
Normally, a kernel thread ignores all signals, but there are a few that
do. A quick grep on the kernel source tree (as of kernel 4.6.0) shows
the following source files which use allow_signal():
drivers/isdn/mISDN/l1oip_core.c
drivers/md/md.c
drivers/misc/mic/cosm/cosm_scif_server.c
drivers/misc/mic/cosm_client/cosm_scif_client.c
drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
drivers/staging/rtl8188eu/core/rtw_cmd.c
drivers/staging/rtl8712/rtl8712_cmd.c
drivers/target/iscsi/iscsi_target.c
drivers/target/iscsi/iscsi_target_login.c
drivers/target/iscsi/iscsi_target_nego.c
drivers/usb/atm/usbatm.c
drivers/usb/gadget/function/f_mass_storage.c
fs/jffs2/background.c
fs/lockd/clntlock.c
fs/lockd/svc.c
fs/nfs/nfs4state.c
fs/nfsd/nfssvc.c
While not all of these are necessarily kthreads and some functionality
may still be unimpeded, it's still quite harmful and can cause
unexpected side-effects, especially because some of these kthreads are
storage-related (which we obviously don't want to kill during bootup).
During discussion at #15226, @dezgeg suggested the following
implementation:
for pid in $(pgrep -v -f '@'); do
if [ "$(cat /proc/$pid/cmdline)" != "" ]; then
kill -9 "$pid"
fi
done
This has a few downsides:
* User space processes which use an empty string in their command line
won't be killed.
* It results in errors during bootup because some shell-related
processes are already terminated (maybe it's pgrep itself, haven't
checked).
* The @ is searched within the full command line, not just at the
beginning of the string. Of course, we already had this until now, so
it's not a problem of his implementation.
I posted an alternative implementation which doesn't suffer from the
first point, but even that one wasn't sufficient:
for pid in $(pgrep -v -f '^@'); do
readlink "/proc/$pid/exe" &> /dev/null || continue
echo "$pid"
done | xargs kill -9
This one spawns a subshell, which would be included in the processes to
kill and actually kills itself during the process.
So what we have now is even checking whether the shell process itself is
in the list to kill and avoids killing it just to be sure.
Also, we don't spawn a subshell anymore and use /proc/$pid/exe to
distinguish between user space and kernel processes like in the comments
of the following StackOverflow answer:
http://stackoverflow.com/a/12231039
We don't need to take care of terminating processes, because what we
actually want IS to terminate the processes.
The only point where this (and any previous) approach falls short if we
have processes that act like fork bombs, because they might spawn
additional processes between the pgrep and the killing. We can only
address this with process/control groups and this still won't save us
because the root user can escape from that as well.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Fixes: #15226