Age | Commit message (Collapse) | Author |
|
This adds parse_nice() that parses a nice level and ensures it is in the right
range, via a new nice_is_valid() helper. It then ports over a number of users
to this.
No functional changes.
|
|
This permits CPUQuota to accept greater values as documented.
|
|
|
|
As suggested by @mbiebl we already use the "!" special char in unit file
assignments for negation, hence we should not use it in a different context for
privileged execution. Let's use "+" instead.
|
|
Let's verify the validity of the syntax of the user/group names set.
|
|
Just in case...
|
|
This adds support for a TasksMax=40% syntax for specifying values relative to
the system's configured maximum number of processes. This is useful in order to
neatly subdivide the available room for tasks within containers.
|
|
seccomp_syscall_resolve_name() can return a mix of positive and negative
(pseudo-) syscall numbers, while errors are signaled via __NR_SCMP_ERROR.
This commit lets the syscall filter parser only abort on real parsing
failures, letting libseccomp handle pseudo-syscall number on its own
and allowing proper multiplexed syscalls filtering.
|
|
|
|
various changes, most importantly regarding memory metrics
|
|
If for whatever reason the file system is "corrupted", we want
to be resilient and ignore the error, as long as we can load the units
from a different place.
Arch bug https://bugs.archlinux.org/task/49547.
A user had an ntfs symlink (essentially a file) instead of a directory after
restoring from backup. We should just ignore that like we would treat a missing
directory, for general resiliency.
We should treat permission errors similarly. For example an unreadable
/usr/local/lib directory would prevent (user) instances of systemd from
loading any units. It seems better to continue.
|
|
The various bits of code did the scaling all different, let's unify this,
given that the code is not trivial.
|
|
settings
If a percentage is used, it is taken relative to the installed RAM size. This
should make it easier to write generic unit files that adapt to the local system.
|
|
And port a couple of users over to it.
|
|
This patch implements the new magic character '!'. By putting '!' in front
of a command, systemd executes it with full privileges ignoring paramters
such as User, Group, SupplementaryGroups, CapabilityBoundingSet,
AmbientCapabilities, SecureBits, SystemCallFilter, SELinuxContext,
AppArmorProfile, SmackProcessLabel, and RestrictAddressFamilies.
Fixes partially https://github.com/systemd/systemd/issues/3414
Related to https://github.com/coreos/rkt/issues/2482
Testing:
1. Create a user 'bob'
2. Create the unit file /etc/systemd/system/exec-perm.service
(You can use the example below)
3. sudo systemctl start ext-perm.service
4. Verify that the commands starting with '!' were not executed as bob,
4.1 Looking to the output of ls -l /tmp/exec-perm
4.2 Each file contains the result of the id command.
`````````````````````````````````````````````````````````````````
[Unit]
Description=ext-perm
[Service]
Type=oneshot
TimeoutStartSec=0
User=bob
ExecStartPre=!/usr/bin/sh -c "/usr/bin/rm /tmp/exec-perm*" ;
/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-start-pre"
ExecStart=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-start" ;
!/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-star-2"
ExecStartPost=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-start-post"
ExecReload=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-reload"
ExecStop=!/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-stop"
ExecStopPost=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-stop-post"
[Install]
WantedBy=multi-user.target]
`````````````````````````````````````````````````````````````````
|
|
an instance (#3451)
Corrects: 7aad67e7
Fixes: #3438
|
|
Recently added cgroup unified hierarchy support uses "max" in configurations
for no upper limit. While consistent with what the kernel uses for no upper
limit, it is inconsistent with what systemd uses for other controllers such as
memory or pids. There's no point in introducing another term. Update cgroup
unified hierarchy support so that "infinity" is the only term that systemd
uses for no upper limit.
|
|
Implement sets of system calls to help constructing system call
filters. A set starts with '@' to distinguish from a system call.
Closes: #3053, #3157
|
|
On the unified hierarchy, memory controller implements three control knobs -
low, high and max which enables more useable and versatile control over memory
usage. This patch implements support for the three control knobs.
* MemoryLow, MemoryHigh and MemoryMax are added for memory.low, memory.high and
memory.max, respectively.
* As all absolute limits on the unified hierarchy use "max" for no limit, make
memory limit parse functions accept "max" in addition to "infinity" and
document "max" for the new knobs.
* Implement compatibility translation between MemoryMax and MemoryLimit.
v2:
- Fixed missing else's in config_parse_memory_limit().
- Fixed missing newline when writing out drop-ins.
- Coding style updates to use "val > 0" instead of "val".
- Minor updates to documentation.
|
|
CGroupBlockIODeviceBandwith is used to keep track of IO bandwidth limits for
legacy cgroup hierarchies. Unlike the unified hierarchy counterpart
CGroupIODeviceLimit, a CGroupBlockIODeviceBandwiddth records either a read or
write limit and has a couple issues.
* There's no way to clear specific config entry.
* When configs are cleared for an IO direction of a unit, the kernel settings
aren't cleared accordingly creating discrepancies.
This patch updates CGroupBlockIODeviceBandwidth so that it behaves similarly to
CGroupIODeviceLimit - each entry records both rbps and wbps limits and is
cleared if both are at default values after kernel settings are updated.
|
|
Currently, there are two cgroup IO limits, bandwidth max for read and write,
and they are hard-coded in various places. This is fine for two limits but IO
is expected to grow more limits - low, high and max limits for bandwidth and
IOPS - and hard-coding each limit won't make sense.
This patch replaces hard-coded limits with an array indexed by
CGroupIOLimitType and accompanying string and default value tables so that new
limits can be added trivially.
|
|
core: add io controller support on the unified hierarchy
|
|
|
|
On the unified hierarchy, blkio controller is renamed to io and the interface
is changed significantly.
* blkio.weight and blkio.weight_device are consolidated into io.weight which
uses the standardized weight range [1, 10000] with 100 as the default value.
* blkio.throttle.{read|write}_{bps|iops}_device are consolidated into io.max.
Expansion of throttling features is being worked on to support
work-conserving absolute limits (io.low and io.high).
* All stats are consolidated into io.stats.
This patchset adds support for the new interface. As the interface has been
revamped and new features are expected to be added, it seems best to treat it
as a separate controller rather than trying to expand the blkio settings
although we might add automatic translation if only blkio settings are
specified.
* io.weight handling is mostly identical to blkio.weight[_device] handling
except that the weight range is different.
* Both read and write bandwidth settings are consolidated into
CGroupIODeviceLimit which describes all limits applicable to the device.
This makes it less painful to add new limits.
* "max" can be used to specify the maximum limit which is equivalent to no
config for max limits and treated as such. If a given CGroupIODeviceLimit
doesn't contain any non-default configs, the config struct is discarded once
the no limit config is applied to cgroup.
* lookup_blkio_device() is renamed to lookup_block_device().
Signed-off-by: Tejun Heo <htejun@fb.com>
|
|
|
|
This reverts commit 6d10d308c6cd16528ef58fa4f5822aef936862d3.
It got squashed by mistake.
|
|
This way it can be used in install.c in subsequent commit.
|
|
If the user defines a symlink alias for a unit whose type does not support
aliasing, detect this early and print a nice warning.
Fixe: #2730
|
|
Many fixes, in particular to the install logic
|
|
Closes #1602
|
|
Previously, we had two enums ManagerRunningAs and UnitFileScope, that were
mostly identical and converted from one to the other all the time. The latter
had one more value UNIT_FILE_GLOBAL however.
Let's simplify things, and remove ManagerRunningAs and replace it by
UnitFileScope everywhere, thus making the translation unnecessary. Introduce
two new macros MANAGER_IS_SYSTEM() and MANAGER_IS_USER() to simplify checking
if we are running in one or the user context.
|
|
A long time ago – when generators where first introduced – the directories for
them were randomly created via mkdtemp(). This was changed later so that they
use fixed name directories now. Let's make use of this, and add the genrator
dirs to the LookupPaths structure and into the unit file search path maintained
in it. This has the benefit that the generator dirs are now normal part of the
search path for all tools, and thus are shown in "systemctl list-unit-files"
too.
|
|
* core/unit: extract checking of stat paths into helper function
The same code was repeated three times.
* core: treat masked files as "unchanged"
systemctl prints the "unit file changed on disk" warning
for a masked unit. I think it's better to print nothing in that
case.
When a masked unit is loaded, set mtime as 0. When checking
if a unit with mtime of 0 needs reload, check that the mask
is still in place.
* test-dnssec: fix build without gcrypt
Also reorder the test functions to follow the way they are called
from main().
|
|
systemctl prints the "unit file changed on disk" warning
for a masked unit. I think it's better to print nothing in that
case.
When a masked unit is loaded, set mtime as 0. When checking
if a unit with mtime of 0 needs reload, check that the mask
is still in place.
|
|
If first attempt to merge units failed and we are trying to do
merge the other way around and at the same time we are working with
template name, then other unit can't possibly be template, because it is
not possible to have template unit running, only instances of the
template. Thus we need to look for already active instance instead.
|
|
|
|
Throughout the tree there's spurious use of spaces separating ++ and --
operators from their respective operands. Make ++ and -- operator
consistent with the majority of existing uses; discard the spaces.
|
|
This reverts commit cb48dfca6a8bc15d9081651001a16bf51e03838a.
Exec*-settings resolve specifiers twice:
%%U -> config_parse_exec [cb48dfca6a8] -> %U -> service_spawn -> 0
Fixes #2637
|
|
The setting is hardly useful (since its effect is generally reduced to zero due
to file system caps), and with the advent of ambient caps an actually useful
replacement exists, hence let's get rid of this.
I am pretty sure this was unused and our man page already recommended against
its use, hence this should be a safe thing to remove.
|
|
This feature will not be used anytime soon, so remove a bit of cruft.
The BusPolicy= config directive will stay around as compat noop.
|
|
cgroup: remove support for NetClass= directive
|
|
Support for net_cls.class_id through the NetClass= configuration directive
has been added in v227 in preparation for a per-unit packet filter mechanism.
However, it turns out the kernel people have decided to deprecate the net_cls
and net_prio controllers in v2. Tejun provides a comprehensive justification
for this in his commit, which has landed during the merge window for kernel
v4.5:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=bd1060a1d671
As we're aiming for full support for the v2 cgroup hierarchy, we can no
longer support this feature. Userspace tool such as nftables are moving over
to setting rules that are specific to the full cgroup path of a task, which
obsoletes these controllers anyway.
This commit removes support for tweaking details in the net_cls controller,
but keeps the NetClass= directive around for legacy compatibility reasons.
|
|
Corrects an incompatibility introduced with 36c16a7cdd6c33d7980efc2cd6a2211941f302b4.
Fixes: #2537
|
|
Let's make things more obvious by placing the parse_usec() invocation directly in config_parse_service_timeout().
|
|
This should be handled fine now by .dir-locals.el, so need to carry that
stuff in every file.
|
|
This clean-ups timeout handling in PID 1. Specifically, instead of storing 0 in internal timeout variables as
indication for a disabled timeout, use USEC_INFINITY which is in-line with how we do this in the rest of our code
(following the logic that 0 means "no", and USEC_INFINITY means "never").
This also replace all usec_t additions with invocations to usec_add(), so that USEC_INFINITY is properly propagated,
and sd-event considers it has indication for turning off the event source.
This also alters the deserialization of the units to restart timeouts from the time they were originally started from.
Before this patch timeouts would be restarted beginning with the time of the deserialization, which could lead to
artificially prolonged timeouts if a daemon reload took place.
Finally, a new RuntimeMaxSec= setting is introduced for service units, that specifies a maximum runtime after which a
specific service is forcibly terminated. This is useful to put time limits on time-intensive processing jobs.
This also simplifies the various xyz_spawn() calls of the various types in that explicit distruction of the timers is
removed, as that is done anyway by the state change handlers, and a state change is always done when the xyz_spawn()
calls fail.
Fixes: #2249
|
|
This way we can reuse it for parsing rlimit settings in "systemctl set-property" and related commands.
|
|
[v1] core: resolve specifier in config_parse_exec()
|
|
This patch adds support for ambient capabilities in service files. The
idea with ambient capabilities is that the execed processes can run with
non-root user and get some inherited capabilities, without having any
need to add the capabilities to the executable file.
You need at least Linux 4.3 to use ambient capabilities. SecureBit
keep-caps is automatically added when you use ambient capabilities and
wish to change the user.
An example system service file might look like this:
[Unit]
Description=Service for testing caps
[Service]
ExecStart=/usr/bin/sleep 10000
User=nobody
AmbientCapabilities=CAP_NET_ADMIN CAP_NET_RAW
After starting the service it has these capabilities:
CapInh: 0000000000003000
CapPrm: 0000000000003000
CapEff: 0000000000003000
CapBnd: 0000003fffffffff
CapAmb: 0000000000003000
|
|
Change the capability bounding set parser and logic so that the bounding
set is kept as a positive set internally. This means that the set
reflects those capabilities that we want to keep instead of drop.
|