Age | Commit message (Collapse) | Author |
|
Add an "invocation ID" concept to the service manager
|
|
|
|
|
|
This adds a new invocation ID concept to the service manager. The invocation ID
identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is
generated each time a unit moves from and inactive to an activating or active
state.
The primary usecase for this concept is to connect the runtime data PID 1
maintains about a service with the offline data the journal stores about it.
Previously we'd use the unit name plus start/stop times, which however is
highly racy since the journal will generally process log data after the service
already ended.
The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel,
except that it applies to an individual unit instead of the whole system.
The invocation ID is passed to the activated processes as environment variable.
It is additionally stored as extended attribute on the cgroup of the unit. The
latter is used by journald to automatically retrieve it for each log logged
message and attach it to the log entry. The environment variable is very easily
accessible, even for unprivileged services. OTOH the extended attribute is only
accessible to privileged processes (this is because cgroupfs only supports the
"trusted." xattr namespace, not "user."). The environment variable may be
altered by services, the extended attribute may not be, hence is the better
choice for the journal.
Note that reading the invocation ID off the extended attribute from journald is
racy, similar to the way reading the unit name for a logging process is.
This patch adds APIs to read the invocation ID to sd-id128:
sd_id128_get_invocation() may be used in a similar fashion to
sd_id128_get_boot().
PID1's own logging is updated to always include the invocation ID when it logs
information about a unit.
A new bus call GetUnitByInvocationID() is added that allows retrieving a bus
path to a unit by its invocation ID. The bus path is built using the invocation
ID, thus providing a path for referring to a unit that is valid only for the
current runtime cycleof it.
Outlook for the future: should the kernel eventually allow passing of cgroup
information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we
can alter the invocation ID to be generated as hash from that rather than
entirely randomly. This way we can derive the invocation race-freely from the
messages.
|
|
This way, we can make use of this in other code, too.
|
|
Add seccomp support for the s390 architecture (31-bit and 64-bit)
to systemd.
This requires libseccomp >= 2.3.1.
|
|
ProtectControlGroups=
If enabled, these will block write access to /sys, /proc/sys and
/proc/sys/fs/cgroup.
|
|
Network file dropins
|
|
This way we don't have to create a nulstr just to unpack it in a moment.
|
|
In preparation for adding a version which takes a strv.
|
|
condition: ignore nanoseconds in timestamps for ConditionNeedsUpdate=
Fixes #4130.
|
|
to prevent false-positives
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=90192 and #4130
for real. Also, remove timestamp check in update-done.c altogether since
the whole operation is idempotent.
|
|
According to its manual page, flags given to mkostemp(3) shouldn't include
O_RDWR, O_CREAT or O_EXCL flags as these are always included. Beyond
those, the only flag that all callers (except a few tests where it
probably doesn't matter) use is O_CLOEXEC, so set that unconditionally.
|
|
https://bugzilla.redhat.com/show_bug.cgi?id=1374371
When root was empty or equal to "/", chroot_symlinks_same was called with
root==NULL, and strjoina returned "", so the code thought both paths are equal
even if they were not. Fix that by always providing a non-null first argument
to strjoina.
|
|
One trailing dot is valid, but more than one isn't. This also fixes glibc's
posix/tst-getaddrinfo5 test.
Fixes #3978.
|
|
In https://github.com/systemd/systemd/pull/4004 , a runtime detection
method for seccomp was added. However, it does not detect the case
where CONFIG_SECCOMP=y but CONFIG_SECCOMP_FILTER=n. This is possible
if the architecture does not support filtering yet.
Add a check for that case too.
While at it, change get_proc_field usage to use PR_GET_SECCOMP prctl,
as that should save a few system calls and (unnecessary) allocations.
Previously, reading of /proc/self/stat was done as recommended by
prctl(2) as safer. However, given that we need to do the prctl call
anyway, lets skip opening, reading and parsing the file.
Code for checking inspired by
https://outflux.net/teach-seccomp/autodetect.html
|
|
This patch allows to configure AgeingTimeSec, Priority and DefaultPVID for
bridge interfaces.
|
|
|
|
permit bus clients to pin units to avoid automatic GC
|
|
Fixes #3882
|
|
bus_connect_transport() is exclusively used from our command line tools, hence
let's set exit-on-disconnect for all of them, making behaviour a bit nicer in
case dbus-daemon goes down.
|
|
|
|
Make sure we return proper errors for types not understood yet.
|
|
Instead of ignoring empty strings retrieved via the bus, treat them as NULL, as
it's customary in systemd.
|
|
Let's make sure we can read the exit code/status properties exposed by PID 1
properly. Let's reuse the existing code for unsigned fields, as we just use it
to copy words around, and don't calculate it.
|
|
A lot of basic code wants to know the stack size, and it is safe if they do,
hence let's permit getrlimit() (but not setrlimit()) by default.
See: #3970
|
|
When told to enable a template unit, and the DefaultInstance specified in that
unit was masked, we would do this. Such a unit cannot be started or loaded, so
reporting successful enabling is misleading and unexpected.
$ systemctl mask getty@tty1
Created symlink /etc/systemd/system/getty@tty1.service → /dev/null.
$ systemctl --root=/ enable getty@tty1
(unchanged)
Failed to enable unit, unit /etc/systemd/system/getty@tty1.service is masked.
$ systemctl --root=/ enable getty@
(before)
Created symlink /etc/systemd/system/getty.target.wants/getty@tty1.service → /usr/lib/systemd/system/getty@.service.
(now)
Failed to enable unit, unit /etc/systemd/system/getty@tty1.service is masked.
The same error is emitted for enable and preset. And an error is emmited, not a
warning, so the failure to enable DefaultInstance is treated the same as if the
instance was specified on the command line. I think that this makes most sense,
for most template units.
Fixes #2513.
|
|
add a new tool for creating transient mount and automount units
|
|
Fix preset-all
|
|
A masked unit is listed in Also=:
$ systemctl cat test1 test2
→# /etc/systemd/system/test1.service
[Unit]
Description=test service 1
[Service]
Type=oneshot
ExecStart=/usr/bin/true
[Install]
WantedBy=multi-user.target
Also=test2.service
Alias=alias1.service
→# /dev/null
$ systemctl --root=/ enable test1
(before)
Created symlink /etc/systemd/system/alias1.service → /etc/systemd/system/test1.service.
Created symlink /etc/systemd/system/multi-user.target.wants/test1.service → /etc/systemd/system/test1.service.
The unit files have no installation config (WantedBy, RequiredBy, Also, Alias
settings in the [Install] section, and DefaultInstance for template units).
This means they are not meant to be enabled using systemctl.
Possible reasons for having this kind of units are:
1) A unit may be statically enabled by being symlinked from another unit's
.wants/ or .requires/ directory.
2) A unit's purpose may be to act as a helper for some other unit which has
a requirement dependency on it.
3) A unit may be started when needed via activation (socket, path, timer,
D-Bus, udev, scripted systemctl call, ...).
4) In case of template units, the unit is meant to be enabled with some
instance name specified.
(after)
Created symlink /etc/systemd/system/alias1.service → /etc/systemd/system/test1.service.
Created symlink /etc/systemd/system/multi-user.target.wants/test1.service → /etc/systemd/system/test1.service.
Unit /etc/systemd/system/test2.service is masked, ignoring.
|
|
Running preset-all on a system installed from rpms or even created
using make install would remove and recreate a lot of symlinks, changing
relative to absolute symlinks. In general relative symlinks are nicer,
so there is no reason to change them, and those spurious changes were
obscuring more interesting stuff.
$ make install DESTDIR=/var/tmp/inst1
$ systemctl preset-all --root=/var/tmp/inst1
(before)
Removed /var/tmp/inst1/etc/systemd/system/network-online.target.wants/systemd-networkd-wait-online.service.
Created symlink /var/tmp/inst1/etc/systemd/system/ctrl-alt-del.target → /usr/lib/systemd/system/exit.target.
Removed /var/tmp/inst1/etc/systemd/system/multi-user.target.wants/remote-fs.target.
Created symlink /var/tmp/inst1/etc/systemd/system/multi-user.target.wants/remote-fs.target → /usr/lib/systemd/system/remote-fs.target.
Created symlink /var/tmp/inst1/etc/systemd/system/multi-user.target.wants/machines.target → /usr/lib/systemd/system/machines.target.
Created symlink /var/tmp/inst1/etc/systemd/system/sockets.target.wants/systemd-journal-remote.socket → /usr/lib/systemd/system/systemd-journal-remote.socket.
Removed /var/tmp/inst1/etc/systemd/system/sockets.target.wants/systemd-networkd.socket.
Created symlink /var/tmp/inst1/etc/systemd/system/sockets.target.wants/systemd-networkd.socket → /usr/lib/systemd/system/systemd-networkd.socket.
Removed /var/tmp/inst1/etc/systemd/system/getty.target.wants/getty@tty1.service.
Created symlink /var/tmp/inst1/etc/systemd/system/getty.target.wants/getty@tty1.service → /usr/lib/systemd/system/getty@.service.
Created symlink /var/tmp/inst1/etc/systemd/system/multi-user.target.wants/systemd-journal-upload.service → /usr/lib/systemd/system/systemd-journal-upload.service.
Removed /var/tmp/inst1/etc/systemd/system/sysinit.target.wants/systemd-timesyncd.service.
Created symlink /var/tmp/inst1/etc/systemd/system/sysinit.target.wants/systemd-timesyncd.service → /usr/lib/systemd/system/systemd-timesyncd.service.
Removed /var/tmp/inst1/etc/systemd/system/multi-user.target.wants/systemd-resolved.service.
Created symlink /var/tmp/inst1/etc/systemd/system/multi-user.target.wants/systemd-resolved.service → /usr/lib/systemd/system/systemd-resolved.service.
Removed /var/tmp/inst1/etc/systemd/system/multi-user.target.wants/systemd-networkd.service.
Created symlink /var/tmp/inst1/etc/systemd/system/multi-user.target.wants/systemd-networkd.service → /usr/lib/systemd/system/systemd-networkd.service.
(after)
Removed /var/tmp/inst1/etc/systemd/system/network-online.target.wants/systemd-networkd-wait-online.service.
Created symlink /var/tmp/inst1/etc/systemd/system/ctrl-alt-del.target → /usr/lib/systemd/system/exit.target.
Created symlink /var/tmp/inst1/etc/systemd/system/multi-user.target.wants/machines.target → /usr/lib/systemd/system/machines.target.
Created symlink /var/tmp/inst1/etc/systemd/system/sockets.target.wants/systemd-journal-remote.socket → /usr/lib/systemd/system/systemd-journal-remote.socket.
Created symlink /var/tmp/inst1/etc/systemd/system/multi-user.target.wants/systemd-journal-upload.service → /usr/lib/systemd/system/systemd-journal-upload.service.
|
|
No functional change intended.
|
|
Before, when interating over unit files during preset-all, behaviour was the
following:
- if we hit the real unit name first, presets were queried for that name, and
that unit was enabled or disabled accordingly,
- if we hit an alias first (one of the symlinks chaining to the real unit), we
checked the presets using the symlink name, and then proceeded to enable or
disable the real unit.
E.g. for systemd-networkd.service we have the alias dbus-org.freedesktop.network1.service
(/usr/lib/systemd/system/dbus-org.freedesktop.network1.service), but the preset
is only for the systemd-networkd.service name. The service would be enabled or
disabled pseudorandomly depending on the order of iteration.
For "preset", behaviour was analogous: preset on the alias name disabled the
service (following the default disable policy), preset on the "real" name
applied the presets.
With the patch, for "preset" and "preset-all" we silently skip symlinks. This
gives mostly the right behaviour, with the limitation that presets on aliases
are ignored. I think that presets on aliases are not that common (at least my
preset files on Fedora don't exhibit any such usage), and should not be
necessary, since whoever installs the preset can just refer to the real unit
file. It would be possible to overcome this limitation by gathering a list of
names of a unit first, and then checking whether *any* of the names matches the
presets list. That would require a significant redesign of the code, and be
a lot slower (since we would have to fully read all unit directories to preset
one unit) to so I'm not doing that for now.
With this patch, two properties are satisfied:
- preset-all and preset are idempotent, and the second and subsequent invocations
do not produce any changes,
- preset-all and preset for a specific name produce the same state for that unit.
Fixes #3616.
|
|
|
|
If the directory is missing, we can assume that those pesky symlinks are gone too.
|
|
|
|
This adds the boolean RemoveIPC= setting to service, socket, mount and swap
units (i.e. all unit types that may invoke processes). if turned on, and the
unit's user/group is not root, all IPC objects of the user/group are removed
when the service is shut down. The life-cycle of the IPC objects is hence bound
to the unit life-cycle.
This is particularly relevant for units with dynamic users, as it is essential
that no objects owned by the dynamic users survive the service exiting. In
fact, this patch adds code to imply RemoveIPC= if DynamicUser= is set.
In order to communicate the UID/GID of an executed process back to PID 1 this
adds a new "user lookup" socket pair, that is inherited into the forked
processes, and closed before the exec(). This is needed since we cannot do NSS
from PID 1 due to deadlock risks, However need to know the used UID/GID in
order to clean up IPC owned by it if the unit shuts down.
|
|
|
|
|
|
This is done exactly the same way a couple of times at various places, let's
unify this into one version.
|
|
core: add cgroup CPU controller support on the unified hierarchy
(zj: merging not squashing to make it clear against which upstream this patch was developed.)
|
|
Under NixOS, the config_path /etc/systemd/system is a symlink to
/etc/static/systemd/system. Commands such as `systemctl list-unit-files`
and `systemctl is-enabled` did not work as the symlink was not followed.
This does not affect how symlinks are treated within the config_path
directory.
|
|
Unfortunately, due to the disagreements in the kernel development community,
CPU controller cgroup v2 support has not been merged and enabling it requires
applying two small out-of-tree kernel patches. The situation is explained in
the following documentation.
https://git.kernel.org/cgit/linux/kernel/git/tj/cgroup.git/tree/Documentation/cgroup-v2-cpu.txt?h=cgroup-v2-cpu
While it isn't clear what will happen with CPU controller cgroup v2 support,
there are critical features which are possible only on cgroup v2 such as
buffered write control making cgroup v2 essential for a lot of workloads. This
commit implements systemd CPU controller support on the unified hierarchy so
that users who choose to deploy CPU controller cgroup v2 support can easily
take advantage of it.
On the unified hierarchy, "cpu.weight" knob replaces "cpu.shares" and "cpu.max"
replaces "cpu.cfs_period_us" and "cpu.cfs_quota_us". [Startup]CPUWeight config
options are added with the usual compat translation. CPU quota settings remain
unchanged and apply to both legacy and unified hierarchies.
v2: - Error in man page corrected.
- CPU config application in cgroup_context_apply() refactored.
- CPU accounting now works on unified hierarchy.
|
|
|
|
This adds parse_nice() that parses a nice level and ensures it is in the right
range, via a new nice_is_valid() helper. It then ports over a number of users
to this.
No functional changes.
|
|
This permits CPUQuota to accept greater values as documented.
|
|
This new output mode formats all timestamps using the usual format_timestamp()
call we use pretty much everywhere else. Timestamps formatted this way are some
ways more useful than traditional syslog timestamps as they include weekday,
month and timezone information, while not being much longer. They are also not
locale-dependent. The primary advantage however is that they may be passed
directly to journalctl's --since= and --until= switches as soon as #3869 is
merged.
While we are at it, let's also add "short-unix" to shell completion.
|
|
This setting adds minimal user namespacing support to a service. When set the invoked
processes will run in their own user namespace. Only a trivial mapping will be
set up: the root user/group is mapped to root, and the user/group of the
service will be mapped to itself, everything else is mapped to nobody.
If this setting is used the service runs with no capabilities on the host, but
configurable capabilities within the service.
This setting is particularly useful in conjunction with RootDirectory= as the
need to synchronize /etc/passwd and /etc/group between the host and the service
OS tree is reduced, as only three UID/GIDs need to match: root, nobody and the
user of the service itself. But even outside the RootDirectory= case this
setting is useful to substantially reduce the attack surface of a service.
Example command to test this:
systemd-run -p PrivateUsers=1 -p User=foobar -t /bin/sh
This runs a shell as user "foobar". When typing "ps" only processes owned by
"root", by "foobar", and by "nobody" should be visible.
|
|
|
|
User expectations are broken when "systemctl enable /some/path/service.service"
behaves differently to "systemctl link ..." followed by "systemctl enable".
From user's POV, "enable" with the full path just combines the two steps into
one.
Fixes #3010.
|