This effectively disables nscd's built-in hosts cache, which turns out
to be erratic in some cases.
We only use nscd these days as a more ABI-neutral NSS dispatcher
mechanism.
Local caching should still be possible with local resolvers in
/etc/resolv.conf (via the `dns` NSS module), or without local resolvers
via systemd-networkd (via the `resolve` nss module)
We don't set enable-cache to no due to
https://github.com/NixOS/nixpkgs/pull/50316#discussion_r241035226.
NixOS usually needs nscd just to have a single place where
LD_LIBRARY_PATH can be set to include all NSS modules, but nscd is also
useful if some of the NSS modules need to read files which are only
accessible by root.
For example, nixos/modules/config/ldap.nix needs this when
users.ldap.enable = true;
users.ldap.daemon.enable = false;
and users.ldap.bind.passwordFile exists. In that case, the module
creates an /etc/ldap.conf which is only readable by root, but which the
NSS module needs to read in order to find out what LDAP server to
connect to and with what credentials.
If nscd is started as root and configured with the server-user option in
nscd.conf, then it gives each NSS module the opportunity to initialize
itself before dropping privileges. The initialization happens in the
glibc-internal __nss_disable_nscd function, which pre-loads all the
configured NSS modules for passwd, group, hosts, and services (but not
netgroup for some reason?) and, for each loaded module, calls an init
function if one is defined. After that finishes, nscd's main() calls
nscd_init() which ends by calling finish_drop_privileges().
There are provisions in systemd for using DynamicUser with a service
which needs to drop privileges itself, so this patch does that.
These options were being set to the same value as the defaults that are
hardcoded in nscd. Delete them so it's clear which settings are actually
important for NixOS.
One exception is `threads 1`, which is different from the built-in
default of 4. However, both values are equivalent because nscd forces
the number of threads to be at least as many as the number of kinds of
databases it supports, which is 5.
nscd doesn't create any files outside of /run/nscd unless the nscd.conf
"persistent" option is used, which we don't do by default. Therefore it
doesn't matter what UID/GID we run this service as, so long as it isn't
shared with any other running processes.
/run/nscd does need to be owned by the same UID that the service is
running as, but systemd takes care of that for us thanks to the
RuntimeDirectory directive.
If someone wants to turn on the "persistent" option, they need to
manually configure users.users.nscd and systemd.tmpfiles.rules so that
/var/db/nscd is owned by the same user that nscd runs as.
In an all-defaults boot.isContainer configuration of NixOS, this removes
the only user which did not have a pre-assigned UID.
Systemd provides an option for allocating DynamicUsers
which we want to use in NixOS to harden service configuration.
However, we discovered that the user wasn't allocated properly
for services. After some digging this turned out to be, of course,
a cache inconsistency problem.
When a DynamicUser creation is performed, Systemd check beforehand
whether the requested user already exists statically. If it does,
it bails out. If it doesn't, systemd continues with allocating the
user.
However, by checking whether the user exists, nscd will store
the fact that the user does not exist in it's negative cache.
When the service tries to lookup what user is associated to its
uid (By calling whoami, for example), it will try to consult
libnss_systemd.so However this will read from the cache and tell
report that the user doesn't exist, and thus will return that
there is no user associated with the uid. It will continue
to do so for the cache duration time. If the service
doesn't immediately looks up its username, this bug is not
triggered, as the cache will be invalidated around this time.
However, if the service is quick enough, it might end up
in a situation where it's incorrectly reported that the
user doesn't exist.
Preferably, we would not be using nscd at all. But we need to
use it because glibc reads nss modules from /etc/nsswitch.conf
by looking relative to the global LD_LIBRARY_PATH. Because LD_LIBRARY_PATH
is not set globally (as that would lead to impurities and ABI issues),
glibc will fail to find any nss modules.
Instead, as a hack, we start up nscd with LD_LIBRARY_PATH set
for only that service. Glibc will forward all nss syscalls to
nscd, which will then respect the LD_LIBRARY_PATH and only
read from locations specified in the NixOS config.
we can load nss modules in a pure fashion.
However, I think by accident, we just copied over the default
settings of nscd, which actually caches user and group lookups.
We already disable this when sssd is enabled, as this interferes
with the correct working of libnss_sss.so as it already
does its own caching of LDAP requests.
(See https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/usingnscd-sssd)
Because nscd caching is now also interferring with libnss_systemd.so
and probably also with other nsss modules, lets just pre-emptively
disable caching for now for all options related to users and groups,
but keep it for caching hosts ans services lookups.
Note that we can not just put in /etc/nscd.conf:
enable-cache passwd no
As this will actually cause glibc to _not_ forward the call to nscd
at all, and thus never reach the nss modules. Instead we set
the negative and positive cache ttls to 0 seconds as a workaround.
This way, Glibc will always forward requests to nscd, but results
will never be cached.
Fixes #50273