Age | Commit message (Collapse) | Author |
|
A 'llu' formatting statement was used in a debugging printf statement
instead of a 'PRIu64'. Correcting that mistake here.
|
|
manager: Only invoke a single sigchld per unit within a cleanup cycle
|
|
make "machinectl clean" asynchronous, and open it up via PolicyKit
|
|
By default, each iteration of manager_dispatch_sigchld() results in a unit level
sigchld event being invoked. For scope units, this results in a scope_sigchld_event()
which can seemingly stall for workloads that have a large number of PIDs within the
scope. The stall exhibits itself as a SIG_0 being initiated for each u->pids entry
as a result of pid_is_unwaited().
v2:
This patch resolves this condition by only paying to cost of a sigchld in the underlying
scope unit once per sigchld iteration. A new "sigchldgen" member resides within the
Unit struct. The Manager is incremented via the sd event loop, accessed via
sd_event_get_iteration, and the Unit member is set to the same value as the manager each
time that a sigchld event is invoked. If the Manager iteration value and Unit member
match, the sigchld event is not invoked for that iteration.
|
|
Commit 3a18b60489504056f9b0b1a139439cbfa60a87e1 introduced a regression that
disabled the color mode for container.
This patch fixes this.
|
|
|
|
It takes a boolean value. If true, access to SCHED_RR, SCHED_FIFO and
SCHED_DEADLINE is blocked, which my be used to lock up the system.
|
|
Let's politely refuse with EPERM rather than kill the whole thing right-away.
|
|
This was forgotten when MemoryDenyWriteExecute= was added: we should set NNP in
all cases when we set seccomp filters.
|
|
It's a function defined by us, hence we should look for the error in its return
value, not in "errno".
|
|
This is a fix-up for 2a9a6f8ac04a69ca36d645f9305a33645f22a22b which covered
non-transient units, but missed the case for transient units.
|
|
Add sd_notify() parameter to change watchdog_usec during runtime.
Application can change watchdog_usec value by
sd_notify like this. Example. sd_notify(0, "WATCHDOG_USEC=20000000").
To reset watchdog_usec as configured value in service file,
restart service.
Notice.
sd_event is not currently supported. If application uses
sd_event_set_watchdog, or sd_watchdog_enabled, do not use
"WATCHDOG_USEC" option through sd_notify.
|
|
Fix console log color
|
|
Also we had to connect PID's stdio to null later since colors_enabled()
assume that stdout is connected to the console.
|
|
When systemd is started by the kernel, the kernel set the TERM
environment variable unconditionnally to "linux" no matter the console
device used. This might be an issue for dumb devices with no colors
support.
This patch uses default_term_for_tty() for getting a more accurate
value. But it makes sure to keep the user preferences (if any) which
might be passed via the kernel command line. For that purpose /proc
should be mounted.
|
|
Jun 16 05:12:08 systemd[1]: Controller 'io' supported: yes
Jun 16 05:12:08 systemd[1]: Controller 'memory' supported: yes
Jun 16 05:12:08 systemd[1]: Controller 'pids' supported: yes
instead of
Jun 16 04:06:50 systemd[1]: Controller 'memory' supported: yes
Jun 16 04:06:50 systemd[1]: Controller 'devices' supported: yes
Jun 16 04:06:50 systemd[1]: Controller 'pids' supported: yes
|
|
This reverts commit ce8aba568156f2b9d0d3b023e960cda3d9d7db81.
We should pass an environment as close as possible to what we originally
got.
|
|
When re-executed, reconnect the console to PID1's stdios as it was the case
when PID1 was initially started by the kernel.
|
|
Delete the dbus1 generator and some critical wiring. This prevents
kdbus from being loaded or detected. As such, it will never be used,
even if the user still has a useful kdbus module loaded on their system.
Sort of fixes #3480. Not really, but it's better than the current state.
|
|
various changes, most importantly regarding memory metrics
|
|
Permit services to detect whether their stdout/stderr is connected to the journal.
|
|
If for whatever reason the file system is "corrupted", we want
to be resilient and ignore the error, as long as we can load the units
from a different place.
Arch bug https://bugs.archlinux.org/task/49547.
A user had an ntfs symlink (essentially a file) instead of a directory after
restoring from backup. We should just ignore that like we would treat a missing
directory, for general resiliency.
We should treat permission errors similarly. For example an unreadable
/usr/local/lib directory would prevent (user) instances of systemd from
loading any units. It seems better to continue.
|
|
executed services
This permits services to detect whether their stdout/stderr is connected to the
journal, and if so talk to the journal directly, thus permitting carrying of
metadata.
As requested by the gtk folks: #2473
|
|
|
|
Super-important change, yeah!
|
|
Fix-up for 2a9a6f8ac04a69ca36d645f9305a33645f22a22b
|
|
The unit files already accept relative, percent-based memory limit
specification, let's make sure "systemctl set-property" support this too.
Since we want the physical memory size of the destination machine to apply we
pass the percentage in a new set of properties that only exist for this
purpose, and can only be set.
|
|
The various bits of code did the scaling all different, let's unify this,
given that the code is not trivial.
|
|
THe latter is a kernelism, we only understand "infinity".
|
|
When parsing unit files we already refuse unit memory limits of zero, let's
also refuse it when the value is set via the bus.
|
|
settings
If a percentage is used, it is taken relative to the installed RAM size. This
should make it easier to write generic unit files that adapt to the local system.
|
|
And port a couple of users over to it.
|
|
When unit is marked as UNSURE, we are trying to find if it state was
changed over and over again. So lets not go through the UNSURE states
again. Also when we find a GOOD unit lets propagate the GOOD state to
all units that this unit reference.
This is a problem on machines with a lot of initscripts with different
starting priority, since those units will reference each other and the
original algorithm might get to n! complexity.
Thanks HATAYAMA Daisuke for the expand_good_state code.
|
|
notifications (#3531)
Fixes: #3483
|
|
Typing `rd.rescue` is easier than `rd.systemd.unit=rescue.target`.
|
|
Move the merger of environment variables before setting up the PAM
session and pass the aggregate environment to PAM setup. This allows
control over the PAM session hooks through environment variables.
PAM session initiation may update the environment. On successful
initiation of a PAM session, we adopt the environment of the
PAM context.
|
|
|
|
This patch implements the new magic character '!'. By putting '!' in front
of a command, systemd executes it with full privileges ignoring paramters
such as User, Group, SupplementaryGroups, CapabilityBoundingSet,
AmbientCapabilities, SecureBits, SystemCallFilter, SELinuxContext,
AppArmorProfile, SmackProcessLabel, and RestrictAddressFamilies.
Fixes partially https://github.com/systemd/systemd/issues/3414
Related to https://github.com/coreos/rkt/issues/2482
Testing:
1. Create a user 'bob'
2. Create the unit file /etc/systemd/system/exec-perm.service
(You can use the example below)
3. sudo systemctl start ext-perm.service
4. Verify that the commands starting with '!' were not executed as bob,
4.1 Looking to the output of ls -l /tmp/exec-perm
4.2 Each file contains the result of the id command.
`````````````````````````````````````````````````````````````````
[Unit]
Description=ext-perm
[Service]
Type=oneshot
TimeoutStartSec=0
User=bob
ExecStartPre=!/usr/bin/sh -c "/usr/bin/rm /tmp/exec-perm*" ;
/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-start-pre"
ExecStart=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-start" ;
!/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-star-2"
ExecStartPost=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-start-post"
ExecReload=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-reload"
ExecStop=!/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-stop"
ExecStopPost=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-stop-post"
[Install]
WantedBy=multi-user.target]
`````````````````````````````````````````````````````````````````
|
|
an instance (#3451)
Corrects: 7aad67e7
Fixes: #3438
|
|
it (#3457)
Let's add an extra safety check before we chmod/chown a TTY to the right user,
as we might end up having connected something to STDIN/STDOUT that is actually
not a TTY, even though this might have been requested, due to permissive
StandardInput= settings or transient service activation with fds passed in.
Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=85255
|
|
Let's explain #3444 briefly in the sources, too.
|
|
Without this code the following can happen:
1. Open a file to keep a mount busy
2. Try to stop the corresponding mount unit with systemctl
-> umount fails and the failure is remembered in mount->result
3. Close the file and umount the filesystem manually
-> mount_dispatch_io() calls "mount_enter_dead(mount, MOUNT_SUCCESS)"
-> Old error in mount->result is reused and the mount unit enters a
failed state
Clear the old error result when 'mountinfo' reports a successful umount to
fix this.
|
|
This basically reverts 7b2fd9d51259f6cf350791434e640ac3519acc6c ("core:
remove duplicate code in automount_update_mount()").
This was not duplicate code. The expire_tokens need to be handled as well:
Send 0 == success for MOUNT_DEAD (umount successful), do nothing for
MOUNT_UNMOUNTING (not yet done) and an error for everything else.
Otherwise the automount logic will assume unmounting is not done and will
not send any new requests for mounting. As a result, the corresponding
mount unit is never mounted.
Without this, automounts with TimeoutIdleSec= are broken. Once the idle
timeout triggered a umount, any access to the corresponding filesystem
hangs forever.
Fixes #3332.
|
|
New exec boolean MemoryDenyWriteExecute, when set, installs
a seccomp filter to reject mmap(2) with PAGE_WRITE|PAGE_EXEC
and mprotect(2) with PAGE_EXEC.
|
|
core: log cgroup legacy and unified hierarchy setting translations
|
|
Recently added cgroup unified hierarchy support uses "max" in configurations
for no upper limit. While consistent with what the kernel uses for no upper
limit, it is inconsistent with what systemd uses for other controllers such as
memory or pids. There's no point in introducing another term. Update cgroup
unified hierarchy support so that "infinity" is the only term that systemd
uses for no upper limit.
|
|
To accommodate changes in kernel interface, cgroup unified hierarchy support
added several configuration items which overlap with the existing resource
control settings and there is simple config translation between the overlapping
settings to ease the transition. As why certain cgroup knobs are being
configured can become confusing, this patch adds a master warning message which
is printed once when such translation is first used and logs each translation
with a debug message.
v2:
- Switched to log_unit*().
|
|
cgroup_context_apply() and friends take CGroupContext and cgroup path as input
and has no way of getting back to the associated Unit and thus uses raw cgroup
path for logging. This makes the log messages difficult to track down.
There's no reason to avoid passing in Unit into these functions. Pass in Unit
and use log_unit*() instead.
While at it, make cgroup_context_apply(), which has no outside users, static.
Also, drop cgroup path from log messages where the path itself isn't too
interesting and can be easily obtained from the unit.
|
|
Implement sets of system calls to help constructing system call
filters. A set starts with '@' to distinguish from a system call.
Closes: #3053, #3157
|
|
The current raw_clone function takes two arguments, the cloning flags and
a pointer to the stack for the cloned child. The raw cloning without
passing a "thread main" function does not make sense if a new stack is
specified, as it returns in both the parent and the child, which will fail
in the child as the stack is virgin. All uses of raw_clone indeed pass NULL
for the stack pointer which indicates that both processes should share the
stack address (so you better don't pass CLONE_VM).
This commit refactors the code to not require the caller to pass the stack
address, as NULL is the only sensible option. It also adds the magic code
needed to make raw_clone work on sparc64, which does not return 0 in %o0
for the child, but indicates the child process by setting %o1 to non-zero.
This refactoring is not plain aesthetic, because non-NULL stack addresses
need to get mangled before being passed to the clone syscall (you have to
apply STACK_BIAS), whereas NULL must not be mangled. Implementing the
conditional mangling of the stack address would needlessly complicate the
code.
raw_clone is moved to a separete header, because the burden of including
the assert machinery and sched.h shouldn't be applied to every user of
missing_syscalls.h
|