Age | Commit message (Collapse) | Author |
|
If /foo is marked to be read-only, and /foo/bar too, then the latter may be
suppressed as it has no effect.
|
|
|
|
Implicitly make all dirs set with RuntimeDirectory= writable, as the concept
otherwise makes no sense.
|
|
This adds a new call get_user_creds_clean(), which is just like
get_user_creds() but returns NULL in the home/shell parameters if they contain
no useful information. This code previously lived in execute.c, but by
generalizing this we can reuse it in run.c.
|
|
|
|
If a dir is marked to be inaccessible then everything below it should be masked
by it.
|
|
ProtectControlGroups=
If enabled, these will block write access to /sys, /proc/sys and
/proc/sys/fs/cgroup.
|
|
Let's make sure that all our rules apply to all archs the local kernel
supports.
|
|
In preparation for adding a version which takes a strv.
|
|
Drop more kdbus functionality
|
|
Previous fix didn't consider handling multiple ExecStop commands.
|
|
If the unit is in the dbus queue when it is removed then the last change
signal is never sent. Fix this by checking the dbus queue and explicitly
send the change signal before sending the remove signal.
|
|
|
|
ENOTCONN may be a legitimate return code if the endpoint disappeared,
but the service should still attempt to shutdown cleanly.
|
|
In https://github.com/systemd/systemd/pull/4004 , a runtime detection
method for seccomp was added. However, it does not detect the case
where CONFIG_SECCOMP=y but CONFIG_SECCOMP_FILTER=n. This is possible
if the architecture does not support filtering yet.
Add a check for that case too.
While at it, change get_proc_field usage to use PR_GET_SECCOMP prctl,
as that should save a few system calls and (unnecessary) allocations.
Previously, reading of /proc/self/stat was done as recommended by
prctl(2) as safer. However, given that we need to do the prctl call
anyway, lets skip opening, reading and parsing the file.
Code for checking inspired by
https://outflux.net/teach-seccomp/autodetect.html
|
|
Similar to MemoryMax=, MemorySwapMax= limits swap usage. This controls
controls "memory.swap.max" attribute in unified cgroup.
|
|
Resolves #3534
|
|
Similar to MemoryMax=, MemorySwapMax= limits swap usage. This controls
controls "memory.swap.max" attribute in unified cgroup.
|
|
|
|
"-f" switch
|
|
Resolves #3534
|
|
"-l" switch (#3827)
|
|
permit bus clients to pin units to avoid automatic GC
|
|
functions (#4019)
Prevents discard-qualifiers warnings when the passed variable was const
|
|
Fixes #3882
|
|
|
|
It is useful for clients to be able to read the last CPU usage counter value of
a unit even if the unit is already terminated. Hence, before destroying a
cgroup's cgroup cache the last CPU usage counter and return it if the cgroup is
gone.
|
|
This adds two (privileged) bus calls Ref() and Unref() to the Unit interface.
The two calls may be used by clients to pin a unit into memory, so that various
runtime properties aren't flushed out by the automatic GC. This is necessary
to permit clients to race-freely acquire runtime results (such as process exit
status/code or accumulated CPU time) on successful service termination.
Ref() and Unref() are fully recursive, hence act like the usual reference
counting concept in C. Taking a reference is a privileged operation, as this
allows pinning units into memory which consumes resources.
Transient units may also gain a reference at the time of creation, via the new
AddRef property (that is only defined for transient units at the time of
creation).
|
|
|
|
Rework console color setup
|
|
Journald dynamic users
|
|
Dynamic users should be treated like system users, and their logs
should end up in the main system journal.
|
|
Also return the first error, since it's most likely to be interesting.
If unlink fails, symlink will usually return EEXIST.
|
|
The parsing functions for [User]TasksMax were inconsistent. Empty string and
"infinity" were interpreted as no limit for TasksMax but not accepted for
UserTasksMax. Update them so that they're consistent with other knobs.
* Empty string indicates the default value.
* "infinity" indicates no limit.
While at it, replace opencoded (uint64_t) -1 with CGROUP_LIMIT_MAX in TasksMax
handling.
v2: Update empty string to indicate the default value as suggested by Zbigniew
Jędrzejewski-Szmek.
v3: Fixed empty UserTasksMax handling.
|
|
This reverts commit affd7ed1a923b0df8479cff1bd9eafb625fdaa66.
> So it looks like make_console_stdio() has bad side effect. More specifically it
> does a TIOCSCTTY ioctl (via acquire_terminal()) which sees to disturb the
> process which was using/owning the console.
Fixes #3842.
https://bugs.debian.org/834367
https://bugzilla.redhat.com/show_bug.cgi?id=1367766
|
|
This should make it easier to figure things out.
|
|
dbus-daemon does NSS name look-ups in order to enforce its bus policy. This
might dead-lock if an NSS module use wants to use D-Bus for the look-up itself,
like our nss-systemd does. Let's work around this by bypassing bus
communication in the NSS module if we run inside of dbus-daemon. To make this
work we keep a bit of extra state in /run/systemd/dynamic-uid/ so that we don't
have to consult the bus, but can still resolve the names.
Note that the normal codepath continues to be via the bus, so that resolving
works from all mount namespaces and is subject to authentication, as before.
This is a bit dirty, but not too dirty, as dbus daemon is kinda special anyway
for PID 1.
|
|
This adds the boolean RemoveIPC= setting to service, socket, mount and swap
units (i.e. all unit types that may invoke processes). if turned on, and the
unit's user/group is not root, all IPC objects of the user/group are removed
when the service is shut down. The life-cycle of the IPC objects is hence bound
to the unit life-cycle.
This is particularly relevant for units with dynamic users, as it is essential
that no objects owned by the dynamic users survive the service exiting. In
fact, this patch adds code to imply RemoveIPC= if DynamicUser= is set.
In order to communicate the UID/GID of an executed process back to PID 1 this
adds a new "user lookup" socket pair, that is inherited into the forked
processes, and closed before the exec(). This is needed since we cannot do NSS
from PID 1 due to deadlock risks, However need to know the used UID/GID in
order to clean up IPC owned by it if the unit shuts down.
|
|
This makes it easier to discern the relevant and obsolete parts of the vtables,
and in particular helps when comparing introspection data with the actual
vtable definitions.
|
|
|
|
|
|
Currently, systemd uses either the legacy hierarchies or the unified hierarchy.
When the legacy hierarchies are used, systemd uses a named legacy hierarchy
mounted on /sys/fs/cgroup/systemd without any kernel controllers for process
management. Due to the shortcomings in the legacy hierarchy, this involves a
lot of workarounds and complexities.
Because the unified hierarchy can be mounted and used in parallel to legacy
hierarchies, there's no reason for systemd to use a legacy hierarchy for
management even if the kernel resource controllers need to be mounted on legacy
hierarchies. It can simply mount the unified hierarchy under
/sys/fs/cgroup/systemd and use it without affecting other legacy hierarchies.
This disables a significant amount of fragile workaround logics and would allow
using features which depend on the unified hierarchy membership such bpf cgroup
v2 membership test. In time, this would also allow deleting the said
complexities.
This patch updates systemd so that it prefers the unified hierarchy for the
systemd cgroup controller hierarchy when legacy hierarchies are used for kernel
resource controllers.
* cg_unified(@controller) is introduced which tests whether the specific
controller in on unified hierarchy and used to choose the unified hierarchy
code path for process and service management when available. Kernel
controller specific operations remain gated by cg_all_unified().
* "systemd.legacy_systemd_cgroup_controller" kernel argument can be used to
force the use of legacy hierarchy for systemd cgroup controller.
* nspawn: By default nspawn uses the same hierarchies as the host. If
UNIFIED_CGROUP_HIERARCHY is set to 1, unified hierarchy is used for all. If
0, legacy for all.
* nspawn: arg_unified_cgroup_hierarchy is made an enum and now encodes one of
three options - legacy, only systemd controller on unified, and unified. The
value is passed into mount setup functions and controls cgroup configuration.
* nspawn: Interpretation of SYSTEMD_CGROUP_CONTROLLER to the actual mount
option is moved to mount_legacy_cgroup_hierarchy() so that it can take an
appropriate action depending on the configuration of the host.
v2: - CGroupUnified enum replaces open coded integer values to indicate the
cgroup operation mode.
- Various style updates.
v3: Fixed a bug in detect_unified_cgroup_hierarchy() introduced during v2.
v4: Restored legacy container on unified host support and fixed another bug in
detect_unified_cgroup_hierarchy().
|
|
A following patch will update cgroup handling so that the systemd controller
(/sys/fs/cgroup/systemd) can use the unified hierarchy even if the kernel
resource controllers are on the legacy hierarchies. This would require
distinguishing whether all controllers are on cgroup v2 or only the systemd
controller is. In preparation, this patch renames cg_unified() to
cg_all_unified().
This patch doesn't cause any functional changes.
|
|
core: add cgroup CPU controller support on the unified hierarchy
(zj: merging not squashing to make it clear against which upstream this patch was developed.)
|
|
|
|
Unfortunately, due to the disagreements in the kernel development community,
CPU controller cgroup v2 support has not been merged and enabling it requires
applying two small out-of-tree kernel patches. The situation is explained in
the following documentation.
https://git.kernel.org/cgit/linux/kernel/git/tj/cgroup.git/tree/Documentation/cgroup-v2-cpu.txt?h=cgroup-v2-cpu
While it isn't clear what will happen with CPU controller cgroup v2 support,
there are critical features which are possible only on cgroup v2 such as
buffered write control making cgroup v2 essential for a lot of workloads. This
commit implements systemd CPU controller support on the unified hierarchy so
that users who choose to deploy CPU controller cgroup v2 support can easily
take advantage of it.
On the unified hierarchy, "cpu.weight" knob replaces "cpu.shares" and "cpu.max"
replaces "cpu.cfs_period_us" and "cpu.cfs_quota_us". [Startup]CPUWeight config
options are added with the usual compat translation. CPU quota settings remain
unchanged and apply to both legacy and unified hierarchies.
v2: - Error in man page corrected.
- CPU config application in cgroup_context_apply() refactored.
- CPU accounting now works on unified hierarchy.
|
|
|
|
beef up /var/tmp and /tmp handling; set $SERVICE_RESULT/$EXIT_CODE/$EXIT_STATUS on ExecStop= and make sure root/nobody are always resolvable
|
|
Fix 3607
|
|
|