Age | Commit message (Collapse) | Author |
|
This basically reverts 7b2fd9d51259f6cf350791434e640ac3519acc6c ("core:
remove duplicate code in automount_update_mount()").
This was not duplicate code. The expire_tokens need to be handled as well:
Send 0 == success for MOUNT_DEAD (umount successful), do nothing for
MOUNT_UNMOUNTING (not yet done) and an error for everything else.
Otherwise the automount logic will assume unmounting is not done and will
not send any new requests for mounting. As a result, the corresponding
mount unit is never mounted.
Without this, automounts with TimeoutIdleSec= are broken. Once the idle
timeout triggered a umount, any access to the corresponding filesystem
hangs forever.
Fixes #3332.
|
|
New exec boolean MemoryDenyWriteExecute, when set, installs
a seccomp filter to reject mmap(2) with PAGE_WRITE|PAGE_EXEC
and mprotect(2) with PAGE_EXEC.
|
|
core: log cgroup legacy and unified hierarchy setting translations
|
|
Recently added cgroup unified hierarchy support uses "max" in configurations
for no upper limit. While consistent with what the kernel uses for no upper
limit, it is inconsistent with what systemd uses for other controllers such as
memory or pids. There's no point in introducing another term. Update cgroup
unified hierarchy support so that "infinity" is the only term that systemd
uses for no upper limit.
|
|
To accommodate changes in kernel interface, cgroup unified hierarchy support
added several configuration items which overlap with the existing resource
control settings and there is simple config translation between the overlapping
settings to ease the transition. As why certain cgroup knobs are being
configured can become confusing, this patch adds a master warning message which
is printed once when such translation is first used and logs each translation
with a debug message.
v2:
- Switched to log_unit*().
|
|
cgroup_context_apply() and friends take CGroupContext and cgroup path as input
and has no way of getting back to the associated Unit and thus uses raw cgroup
path for logging. This makes the log messages difficult to track down.
There's no reason to avoid passing in Unit into these functions. Pass in Unit
and use log_unit*() instead.
While at it, make cgroup_context_apply(), which has no outside users, static.
Also, drop cgroup path from log messages where the path itself isn't too
interesting and can be easily obtained from the unit.
|
|
Implement sets of system calls to help constructing system call
filters. A set starts with '@' to distinguish from a system call.
Closes: #3053, #3157
|
|
The current raw_clone function takes two arguments, the cloning flags and
a pointer to the stack for the cloned child. The raw cloning without
passing a "thread main" function does not make sense if a new stack is
specified, as it returns in both the parent and the child, which will fail
in the child as the stack is virgin. All uses of raw_clone indeed pass NULL
for the stack pointer which indicates that both processes should share the
stack address (so you better don't pass CLONE_VM).
This commit refactors the code to not require the caller to pass the stack
address, as NULL is the only sensible option. It also adds the magic code
needed to make raw_clone work on sparc64, which does not return 0 in %o0
for the child, but indicates the child process by setting %o1 to non-zero.
This refactoring is not plain aesthetic, because non-NULL stack addresses
need to get mangled before being passed to the clone syscall (you have to
apply STACK_BIAS), whereas NULL must not be mangled. Implementing the
conditional mangling of the stack address would needlessly complicate the
code.
raw_clone is moved to a separete header, because the burden of including
the assert machinery and sched.h shouldn't be applied to every user of
missing_syscalls.h
|
|
This reverts part of #3329, but all for a good cause.
|
|
unit_write_drop_in{,_private}{,_format} are all affected.
We already append a header to the file (and section markers), so those functions
can only be used to write a whole file at once. Including the newline at
the end feels natural.
After this commit newlines will be duplicated. They will be removed in
subsequent commit.
Also, rewrap the "autogenerated" header to fit within 80 columns.
|
|
various sd-Ipv4ll/sd-ipv4acd fixes
|
|
On the unified hierarchy, memory controller implements three control knobs -
low, high and max which enables more useable and versatile control over memory
usage. This patch implements support for the three control knobs.
* MemoryLow, MemoryHigh and MemoryMax are added for memory.low, memory.high and
memory.max, respectively.
* As all absolute limits on the unified hierarchy use "max" for no limit, make
memory limit parse functions accept "max" in addition to "infinity" and
document "max" for the new knobs.
* Implement compatibility translation between MemoryMax and MemoryLimit.
v2:
- Fixed missing else's in config_parse_memory_limit().
- Fixed missing newline when writing out drop-ins.
- Coding style updates to use "val > 0" instead of "val".
- Minor updates to documentation.
|
|
dbus-cgroup fixes
|
|
|
|
|
|
Except for per-device BlockIO, IO and DeviceAllow/Deny settings, all were
missing newline causing the next drop-in to be concatenated at the end of the
line. Fix it.
|
|
bus_cgroup_set_property() was rejecting if the input value was in range.
Reverse it.
|
|
Recently added cgroup helper functions break the style convention. Fix them
up.
|
|
|
|
Implement compat translation between IO* and BlockIO* settings
|
|
free_and_strdup handles NULL but not empty strings.
See also:
https://github.com/systemd/systemd/pull/3283#issuecomment-220603145
https://github.com/systemd/systemd/pull/3307
|
|
Adds support to core for systemd D-Bus clients to send the
`SELinuxContext` property . This means `systemd-run -p
SELinuxContext=foo` should now work.
|
|
free_and_strdup already handles the NULL case for us, so we can remove
an extraneous conditional check.
As noted in https://github.com/systemd/systemd/pull/3279/files#r63687717
|
|
Due to the substantial interface changes in cgroup unified hierarchy, new IO
settings are introduced. Currently, IO settings apply only to unified
hierarchy and BlockIO to legacy. While the transition is necessary, it's
painful for users to have to provide configs for both. This patch implements
translation from one config set to another for configs which make sense.
* The translation takes place during application of the configs. Users won't
see IO or BlockIO settings appearing without being explicitly created.
* The translation takes place only if there is no config for the matching
cgroup hierarchy type at all.
While this doesn't provide comprehensive compatibility, it should considerably
ease transition to the new IO settings which are a superset of BlockIO
settings.
v2:
- Update test-cgroup-mask.c so that it accounts for the fact that
CGROUP_MASK_IO and CGROUP_MASK_BLKIO move together. Also, test/parent.slice
now sets IOWeight instead of BlockIOWeight.
|
|
Factor out the following functions out of cgroup_context_apply()
* cgroup_context_[blk]io_weight()
* cgroup_apply_[blk]io_device_weight()
* cgroup_apply_[blk]io_device_limit()
This is pure refactoring and shouldn't cause any functional differences.
|
|
CGroupBlockIODeviceBandwith is used to keep track of IO bandwidth limits for
legacy cgroup hierarchies. Unlike the unified hierarchy counterpart
CGroupIODeviceLimit, a CGroupBlockIODeviceBandwiddth records either a read or
write limit and has a couple issues.
* There's no way to clear specific config entry.
* When configs are cleared for an IO direction of a unit, the kernel settings
aren't cleared accordingly creating discrepancies.
This patch updates CGroupBlockIODeviceBandwidth so that it behaves similarly to
CGroupIODeviceLimit - each entry records both rbps and wbps limits and is
cleared if both are at default values after kernel settings are updated.
|
|
cgroup IO controller supports maximum limits for both bandwidth and IOPS but
systemd resource control currently only supports bandwidth limits. This patch
adds support for IOReadIOPSMax and IOWriteIOPSMax when unified cgroup hierarchy
is in use.
It isn't difficult to also add BlockIOReadIOPS and BlockIOWriteIOPS for legacy
hierarchies but IO control on legacy hierarchies is half-broken anyway, so
let's leave it alone for now.
|
|
Currently, there are two cgroup IO limits, bandwidth max for read and write,
and they are hard-coded in various places. This is fine for two limits but IO
is expected to grow more limits - low, high and max limits for bandwidth and
IOPS - and hard-coding each limit won't make sense.
This patch replaces hard-coded limits with an array indexed by
CGroupIOLimitType and accompanying string and default value tables so that new
limits can be added trivially.
|
|
Makes it consistent with the other branches here.
|
|
core: add io controller support on the unified hierarchy
|
|
We currently generate log message about unit being started even when
unit was started already and job didn't do anything. This is because job
was requested explicitly and hence became anchor job of the transaction
thus we could not eliminate it. That is fine but, let's not pollute
journal with useless log messages.
$ systemctl start systemd-resolved
$ systemctl start systemd-resolved
$ systemctl start systemd-resolved
Current state:
$ journalctl -u systemd-resolved | grep Started
May 05 15:31:42 rawhide systemd[1]: Started Network Name Resolution.
May 05 15:31:59 rawhide systemd[1]: Started Network Name Resolution.
May 05 15:32:01 rawhide systemd[1]: Started Network Name Resolution.
After patch applied:
$ journalctl -u systemd-resolved | grep Started
May 05 16:42:12 rawhide systemd[1]: Started Network Name Resolution.
Fixes #1723
|
|
Private /dev will not be managed by udev or others, so we can make it
noexec and readonly after we have made all device nodes. As /dev/shm
needs to be writable, we can't use bind_remount_recursive().
|
|
unit_set_slice() fails with -EBUSY if the unit already has a slice associated
with it. This makes it impossible to override slice through dropin config or
over dbus. There's no reason to disallow slice changes as long as cgroups
aren't realized. Fix it.
Fixes #3240.
Signed-off-by: Tejun Heo <htejun@fb.com>
Reported-by: Davide Cavalca <dcavalca@fb.com>
|
|
Drop all dangling old /dev mounts before mounting a new private /dev tree.
|
|
This new method returns information by unit names. Instead of ListUnitsByPatterns
this method returns information of inactive and even unexisting units.
Moved dbus unit reply logic into a separate shared function.
Resolves https://github.com/coreos/fleet/pull/1418
|
|
Fix "preset-all" with dangling symlinks and install-section hint emitted too eagerly
|
|
|
|
don't reopen socket fds when reloading the daemon
|
|
|
|
Cgroup fixes.
|
|
core: use an AF_UNIX/SOCK_DGRAM socket for cgroup agent notification
|
|
|
|
Before:
$ systemctl show --property TriggerLimitIntervalSec test.socket
TriggerLimitIntervalSec=2000000
After:
$ systemctl show --property TriggerLimitIntervalUSec test.socket
TriggerLimitIntervalUSec=2s
|
|
Fixes: #3194
|
|
Previously, we'd simply close and reopen the socket file descriptors. This is
problematic however, as we won't transition through the SOCKET_CHOWN state
then, and thus the file ownership won't be correct for the sockets.
Rework the flushing logic, and actually read any queued data from the sockets
for flushing, and accept any queued messages and disconnect them.
|
|
Previously, when the daemon was reloaded and the configuration of a socket unit
file was changed so that a different set of socket ports was defined for the
socket we'd simply reopen the socket fds not yet open. This is problematic
however, as this means the SOCKET_CHOWN state is not run for them, and thus
their UID/GID is not corrected.
With this change, don't open the missing file descriptors, but log about this
issue, and ask the user to restart the socket explicit, to make sure all
missing fds are opened.
Fixes: #3171
|
|
This should bring no behavioural change.
|
|
On the unified hierarchy, blkio controller is renamed to io and the interface
is changed significantly.
* blkio.weight and blkio.weight_device are consolidated into io.weight which
uses the standardized weight range [1, 10000] with 100 as the default value.
* blkio.throttle.{read|write}_{bps|iops}_device are consolidated into io.max.
Expansion of throttling features is being worked on to support
work-conserving absolute limits (io.low and io.high).
* All stats are consolidated into io.stats.
This patchset adds support for the new interface. As the interface has been
revamped and new features are expected to be added, it seems best to treat it
as a separate controller rather than trying to expand the blkio settings
although we might add automatic translation if only blkio settings are
specified.
* io.weight handling is mostly identical to blkio.weight[_device] handling
except that the weight range is different.
* Both read and write bandwidth settings are consolidated into
CGroupIODeviceLimit which describes all limits applicable to the device.
This makes it less painful to add new limits.
* "max" can be used to specify the maximum limit which is equivalent to no
config for max limits and treated as such. If a given CGroupIODeviceLimit
doesn't contain any non-default configs, the config struct is discarded once
the no limit config is applied to cgroup.
* lookup_blkio_device() is renamed to lookup_block_device().
Signed-off-by: Tejun Heo <htejun@fb.com>
|
|
The unit file settings are called SocketUser= and SocketGroup= hence name these
fields that way in the "systemd-analyze dump" output too.
https://github.com/systemd/systemd/issues/3171#issuecomment-216216995
|
|
Let's lower the default values a bit, and pick different defaults for
Accept=yes and Accept=no sockets.
Fixes: #3167
|