summaryrefslogtreecommitdiff
path: root/src/core/manager.c
AgeCommit message (Collapse)Author
2016-10-21core: use emergency_action for ctr+alt+del burstLukas Nykryn
Fixes #4306
2016-10-19pid1: downgrade some rlimit warningsZbigniew Jędrzejewski-Szmek
Since we ignore the result anyway, downgrade errors to warning. log_oom() will still emit an error, but that's mostly theoretical, so it is not worth complicating the code to avoid the small inconsistency
2016-10-16tree-wide: use mfree moreZbigniew Jędrzejewski-Szmek
2016-10-07core: add "invocation ID" concept to service managerLennart Poettering
This adds a new invocation ID concept to the service manager. The invocation ID identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is generated each time a unit moves from and inactive to an activating or active state. The primary usecase for this concept is to connect the runtime data PID 1 maintains about a service with the offline data the journal stores about it. Previously we'd use the unit name plus start/stop times, which however is highly racy since the journal will generally process log data after the service already ended. The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel, except that it applies to an individual unit instead of the whole system. The invocation ID is passed to the activated processes as environment variable. It is additionally stored as extended attribute on the cgroup of the unit. The latter is used by journald to automatically retrieve it for each log logged message and attach it to the log entry. The environment variable is very easily accessible, even for unprivileged services. OTOH the extended attribute is only accessible to privileged processes (this is because cgroupfs only supports the "trusted." xattr namespace, not "user."). The environment variable may be altered by services, the extended attribute may not be, hence is the better choice for the journal. Note that reading the invocation ID off the extended attribute from journald is racy, similar to the way reading the unit name for a logging process is. This patch adds APIs to read the invocation ID to sd-id128: sd_id128_get_invocation() may be used in a similar fashion to sd_id128_get_boot(). PID1's own logging is updated to always include the invocation ID when it logs information about a unit. A new bus call GetUnitByInvocationID() is added that allows retrieving a bus path to a unit by its invocation ID. The bus path is built using the invocation ID, thus providing a path for referring to a unit that is valid only for the current runtime cycleof it. Outlook for the future: should the kernel eventually allow passing of cgroup information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we can alter the invocation ID to be generated as hash from that rather than entirely randomly. This way we can derive the invocation race-freely from the messages.
2016-10-07core: only warn on short reads on signal fdZbigniew Jędrzejewski-Szmek
2016-10-07manager: tighten incoming notification message checksLennart Poettering
Let's not accept datagrams with embedded NUL bytes. Previously we'd simply ignore everything after the first NUL byte. But given that sending us that is pretty ugly let's instead complain and refuse. With this change we'll only accept messages that have exactly zero or one NUL bytes at the very end of the datagram.
2016-10-07manager: be stricter with incomining notifications, warn properly about too ↵Lennart Poettering
large ones Let's make the kernel let us know the full, original datagram size of the incoming message. If it's larger than the buffer space provided by us, drop the whole message with a warning. Before this change the kernel would truncate the message for us to the buffer space provided, and we'd not complain about this, and simply process the incomplete message as far as it made sense.
2016-10-07manager: don't ever busy loop when we get a notification message we can't ↵Lennart Poettering
process If the kernel doesn't permit us to dequeue/process an incoming notification datagram message it's still better to stop processing the notification messages altogether than to enter a busy loop where we keep getting notified but can't do a thing about it. With this change, manager_dispatch_notify_fd() behaviour is changed like this: - if an error indicating a spurious wake-up is seen on recvmsg(), ignore it (EAGAIN/EINTR) - if any other error is seen on recvmsg() propagate it, thus disabling processing of further wakeups - if any error is seen on later code in the function, warn about it but do not propagate it, as in this cas we're not going to busy loop as the offending message is already dequeued.
2016-10-06core: add possibility to set action for ctrl-alt-del burst (#4105)Lukáš Nykrýn
For some certification, it should not be possible to reboot the machine through ctrl-alt-delete. Currently we suggest our customers to mask the ctrl-alt-delete target, but that is obviously not enough. Patching the keymaps to disable that is really not a way to go for them, because the settings need to be easily checked by some SCAP tools.
2016-10-01core: do not try to create /run/systemd/transient in test modeZbigniew Jędrzejewski-Szmek
This prevented systemd-analyze from unprivileged operation on older systemd installations, which should be possible. Also, we shouldn't touch the file system in test mode even if we can.
2016-10-01core: update warning messageZbigniew Jędrzejewski-Szmek
"closing all" might suggest that _all_ fds received with the notification message will be closed. Reword the message to clarify that only the "unused" ones will be closed.
2016-10-01core: get rid of unneeded state variableZbigniew Jędrzejewski-Szmek
No functional change.
2016-09-29pid1: more informative error message for ignored notificationsZbigniew Jędrzejewski-Szmek
It's probably easier to diagnose a bad notification message if the contents are printed. But still, do anything only if debugging is on.
2016-09-29pid1: process zero-length notification messages againZbigniew Jędrzejewski-Szmek
This undoes 531ac2b234. I acked that patch without looking at the code carefully enough. There are two problems: - we want to process the fds anyway - in principle empty notification messages are valid, and we should process them as usual, including logging using log_unit_debug().
2016-09-29pid1: don't return any error in manager_dispatch_notify_fd() (#4240)Franck Bui
If manager_dispatch_notify_fd() fails and returns an error then the handling of service notifications will be disabled entirely leading to a compromised system. For example pid1 won't be able to receive the WATCHDOG messages anymore and will kill all services supposed to send such messages.
2016-09-29If the notification message length is 0, ignore the message (#4237)Jorge Niedbalski
Fixes #4234. Signed-off-by: Jorge Niedbalski <jnr@metaklass.org>
2016-09-09pid1: drop kdbus_fd and all associated logicZbigniew Jędrzejewski-Szmek
2016-08-22core: add Ref()/Unref() bus calls for unitsLennart Poettering
This adds two (privileged) bus calls Ref() and Unref() to the Unit interface. The two calls may be used by clients to pin a unit into memory, so that various runtime properties aren't flushed out by the automatic GC. This is necessary to permit clients to race-freely acquire runtime results (such as process exit status/code or accumulated CPU time) on successful service termination. Ref() and Unref() are fully recursive, hence act like the usual reference counting concept in C. Taking a reference is a privileged operation, as this allows pinning units into memory which consumes resources. Transient units may also gain a reference at the time of creation, via the new AddRef property (that is only defined for transient units at the time of creation).
2016-08-19Merge pull request #3965 from htejun/systemd-controller-on-unifiedZbigniew Jędrzejewski-Szmek
2016-08-19core: add RemoveIPC= settingLennart Poettering
This adds the boolean RemoveIPC= setting to service, socket, mount and swap units (i.e. all unit types that may invoke processes). if turned on, and the unit's user/group is not root, all IPC objects of the user/group are removed when the service is shut down. The life-cycle of the IPC objects is hence bound to the unit life-cycle. This is particularly relevant for units with dynamic users, as it is essential that no objects owned by the dynamic users survive the service exiting. In fact, this patch adds code to imply RemoveIPC= if DynamicUser= is set. In order to communicate the UID/GID of an executed process back to PID 1 this adds a new "user lookup" socket pair, that is inherited into the forked processes, and closed before the exec(). This is needed since we cannot do NSS from PID 1 due to deadlock risks, However need to know the used UID/GID in order to clean up IPC owned by it if the unit shuts down.
2016-08-17core: use the unified hierarchy for the systemd cgroup controller hierarchyTejun Heo
Currently, systemd uses either the legacy hierarchies or the unified hierarchy. When the legacy hierarchies are used, systemd uses a named legacy hierarchy mounted on /sys/fs/cgroup/systemd without any kernel controllers for process management. Due to the shortcomings in the legacy hierarchy, this involves a lot of workarounds and complexities. Because the unified hierarchy can be mounted and used in parallel to legacy hierarchies, there's no reason for systemd to use a legacy hierarchy for management even if the kernel resource controllers need to be mounted on legacy hierarchies. It can simply mount the unified hierarchy under /sys/fs/cgroup/systemd and use it without affecting other legacy hierarchies. This disables a significant amount of fragile workaround logics and would allow using features which depend on the unified hierarchy membership such bpf cgroup v2 membership test. In time, this would also allow deleting the said complexities. This patch updates systemd so that it prefers the unified hierarchy for the systemd cgroup controller hierarchy when legacy hierarchies are used for kernel resource controllers. * cg_unified(@controller) is introduced which tests whether the specific controller in on unified hierarchy and used to choose the unified hierarchy code path for process and service management when available. Kernel controller specific operations remain gated by cg_all_unified(). * "systemd.legacy_systemd_cgroup_controller" kernel argument can be used to force the use of legacy hierarchy for systemd cgroup controller. * nspawn: By default nspawn uses the same hierarchies as the host. If UNIFIED_CGROUP_HIERARCHY is set to 1, unified hierarchy is used for all. If 0, legacy for all. * nspawn: arg_unified_cgroup_hierarchy is made an enum and now encodes one of three options - legacy, only systemd controller on unified, and unified. The value is passed into mount setup functions and controls cgroup configuration. * nspawn: Interpretation of SYSTEMD_CGROUP_CONTROLLER to the actual mount option is moved to mount_legacy_cgroup_hierarchy() so that it can take an appropriate action depending on the configuration of the host. v2: - CGroupUnified enum replaces open coded integer values to indicate the cgroup operation mode. - Various style updates. v3: Fixed a bug in detect_unified_cgroup_hierarchy() introduced during v2. v4: Restored legacy container on unified host support and fixed another bug in detect_unified_cgroup_hierarchy().
2016-08-15core: rename cg_unified() to cg_all_unified()Tejun Heo
A following patch will update cgroup handling so that the systemd controller (/sys/fs/cgroup/systemd) can use the unified hierarchy even if the kernel resource controllers are on the legacy hierarchies. This would require distinguishing whether all controllers are on cgroup v2 or only the systemd controller is. In preparation, this patch renames cg_unified() to cg_all_unified(). This patch doesn't cause any functional changes.
2016-08-03core: drop spurious newlineLennart Poettering
2016-07-25Merge pull request #3728 from poettering/dynamic-usersZbigniew Jędrzejewski-Szmek
2016-07-22core: add a concept of "dynamic" user ids, that are allocated as long as a ↵Lennart Poettering
service is running This adds a new boolean setting DynamicUser= to service files. If set, a new user will be allocated dynamically when the unit is started, and released when it is stopped. The user ID is allocated from the range 61184..65519. The user will not be added to /etc/passwd (but an NSS module to be added later should make it show up in getent passwd). For now, care should be taken that the service writes no files to disk, since this might result in files owned by UIDs that might get assigned dynamically to a different service later on. Later patches will tighten sandboxing in order to ensure that this cannot happen, except for a few selected directories. A simple way to test this is: systemd-run -p DynamicUser=1 /bin/sleep 99999
2016-07-22core: change TasksMax= default for system services to 15%Lennart Poettering
As it turns out 512 is max number of tasks per service is hit by too many applications, hence let's bump it a bit, and make it relative to the system's maximum number of PIDs. With this change the new default is 15%. At the kernel's default pids_max value of 32768 this translates to 4915. At machined's default TasksMax= setting of 16384 this translates to 2457. Why 15%? Because it sounds like a round number and is close enough to 4096 which I was going for, i.e. an eight-fold increase over the old 512 Summary: | on the host | in a container old default | 512 | 512 new default | 4915 | 2457
2016-07-21core: remove duplicate includes (#3771)Thomas H. P. Andersen
2016-07-16manager: don't skip sigchld handler for main and control pid for services ↵Lukáš Nykrýn
(#3738) During stop when service has one "regular" pid one main pid and one control pid and the sighld for the regular one is processed first the unit_tidy_watch_pids will skip the main and control pid and does not remove them from u->pids(). But then we skip the sigchld event because we already did one in the iteration and there are two pids in u->pids. v2: Use general unit_main_pid() and unit_control_pid() instead of reaching directly to service structure.
2016-07-01manager: Fixing a debug printf formatting mistake (#3640)Kyle Walker
A 'llu' formatting statement was used in a debugging printf statement instead of a 'PRIu64'. Correcting that mistake here.
2016-06-30manager: Only invoke a single sigchld per unit within a cleanup cycleKyle Walker
By default, each iteration of manager_dispatch_sigchld() results in a unit level sigchld event being invoked. For scope units, this results in a scope_sigchld_event() which can seemingly stall for workloads that have a large number of PIDs within the scope. The stall exhibits itself as a SIG_0 being initiated for each u->pids entry as a result of pid_is_unwaited(). v2: This patch resolves this condition by only paying to cost of a sigchld in the underlying scope unit once per sigchld iteration. A new "sigchldgen" member resides within the Unit struct. The Manager is incremented via the sd event loop, accessed via sd_event_get_iteration, and the Unit member is set to the same value as the manager each time that a sigchld event is invoked. If the Manager iteration value and Unit member match, the sigchld event is not invoked for that iteration.
2016-06-18Ensure kdbus isn't used (#3501)Dave Reisner
Delete the dbus1 generator and some critical wiring. This prevents kdbus from being loaded or detected. As such, it will never be used, even if the user still has a useful kdbus module loaded on their system. Sort of fixes #3480. Not really, but it's better than the current state.
2016-06-14manager: reduce complexity of unit_gc_sweep (#3507)Lukáš Nykrýn
When unit is marked as UNSURE, we are trying to find if it state was changed over and over again. So lets not go through the UNSURE states again. Also when we find a GOOD unit lets propagate the GOOD state to all units that this unit reference. This is a problem on machines with a lot of initscripts with different starting priority, since those units will reference each other and the original algorithm might get to n! complexity. Thanks HATAYAMA Daisuke for the expand_good_state code.
2016-06-10core: disable colors when displaying cylon when systemd.log_color=off (#3495)Franck Bui
2016-05-26manager: remove spurious newlineLennart Poettering
2016-05-16core: don't log job status message in case job was effectively NOP (#3199)Michal Sekletar
We currently generate log message about unit being started even when unit was started already and job didn't do anything. This is because job was requested explicitly and hence became anchor job of the transaction thus we could not eliminate it. That is fine but, let's not pollute journal with useless log messages. $ systemctl start systemd-resolved $ systemctl start systemd-resolved $ systemctl start systemd-resolved Current state: $ journalctl -u systemd-resolved | grep Started May 05 15:31:42 rawhide systemd[1]: Started Network Name Resolution. May 05 15:31:59 rawhide systemd[1]: Started Network Name Resolution. May 05 15:32:01 rawhide systemd[1]: Started Network Name Resolution. After patch applied: $ journalctl -u systemd-resolved | grep Started May 05 16:42:12 rawhide systemd[1]: Started Network Name Resolution. Fixes #1723
2016-05-05tree-wide: introduce new SOCKADDR_UN_LEN() macro, and use it everywhereLennart Poettering
The macro determines the right length of a AF_UNIX "struct sockaddr_un" to pass to connect() or bind(). It automatically figures out if the socket refers to an abstract namespace socket, or a socket in the file system, and properly handles the full length of the path field. This macro is not only safer, but also simpler to use, than the usual offsetof() + strlen() logic.
2016-05-05core: use an AF_UNIX/SOCK_DGRAM socket for cgroup agent notificationLennart Poettering
dbus-daemon currently uses a backlog of 30 on its D-bus system bus socket. On overloaded systems this means that only 30 connections may be queued without dbus-daemon processing them before further connection attempts fail. Our cgroups-agent binary so far used D-Bus for its messaging, and hitting this limit hence may result in us losing cgroup empty messages. This patch adds a seperate cgroup agent socket of type AF_UNIX/SOCK_DGRAM. Since sockets of these types need no connection set up, no listen() backlog applies. Our cgroup-agent binary will hence simply block as long as it can't enqueue its datagram message, so that we won't lose cgroup empty messages as likely anymore. This also rearranges the ordering of the processing of SIGCHLD signals, service notification messages (sd_notify()...) and the two types of cgroup notifications (inotify for the unified hierarchy support, and agent for the classic hierarchy support). We now always process events for these in the following order: 1. service notification messages (SD_EVENT_PRIORITY_NORMAL-7) 2. SIGCHLD signals (SD_EVENT_PRIORITY_NORMAL-6) 3. cgroup inotify and cgroup agent (SD_EVENT_PRIORITY_NORMAL-5) This is because when receiving SIGCHLD we invalidate PID information, which we need to process the service notification messages which are bound to PIDs. Hence the order between the first two items. And we want to process SIGCHLD metadata to detect whether a service is gone, before using cgroup notifications, to decide when a service is gone, since the former carries more useful metadata. Related to this: https://bugs.freedesktop.org/show_bug.cgi?id=95264 https://github.com/systemd/systemd/issues/1961
2016-04-22coredump,basic: generalize O_TMPFILE handling a bitLennart Poettering
This moves the O_TMPFILE handling from the coredumping code into common library code, and generalizes it as open_tmpfile_linkable() + link_tmpfile(). The existing open_tmpfile() function (which creates an unlinked temporary file that cannot be linked into the fs) is renamed to open_tmpfile_unlinkable(), to make the distinction clear. Thus, code may now choose between: a) open_tmpfile_linkable() + link_tmpfile() b) open_tmpfile_unlinkable() Depending on whether they want a file that may be linked back into the fs later on or not. In a later commit we should probably convert fopen_temporary() to make use of open_tmpfile_linkable(). Followup for: #3065
2016-04-12systemctl: don't confuse sysv code with generated unitsLennart Poettering
The SysV compat code checks whether there's a native unit file before looking for a SysV init script. Since the newest rework generated units will show up in the unit path, and hence the checks ended up assuming that there always was a native unit file for each init script: the generated one. With this change the generated unit file directory is suppressed from the search path when this check is done, to avoid the confusion.
2016-04-12install: rename generator_paths() → generator_binary_paths()Lennart Poettering
This is too confusing, as this funciton returns the paths to the generator binaries, while usually when we refer to the just the "generator path" we mean the generated unit files. Let's clean this up.
2016-04-12core: move flushing of generated unit files to path-lookup.cLennart Poettering
It's very similar to the mkdir and trim operations for the generator dirs, hence let's unify this at a single place.
2016-04-12core: modernize manager_build_unit_patch_cache() a bitLennart Poettering
2016-04-12core: rework logic to drop duplicate and non-existing items from search pathLennart Poettering
Move this into a function of its own, so that we can run it after we ran the generators, so that it takes into account removed generator dirs.
2016-04-12path-lookup: split out logic for mkdir/rmdir of generator dirs in their own ↵Lennart Poettering
functions
2016-04-12core: add a separate unit directory for transient unitsLennart Poettering
Previously, transient units were created below the normal runtime directory /run/systemd/system. With this change they are created in a special transient directory /run/systemd/transient, which only contains data for transient units. This clarifies the life-cycle of transient units, and makes clear they are distinct from user-provided runtime units. In particular, users may now extend transient units via /run/systemd/system, without systemd interfering with the life-cycle of these files. This change also adds code so that when a transient unit exits only the drop-ins in this new directory are removed, but nothing else. Fixes: #2139
2016-04-12core: reuse manager_get_runtime_prefix() at more placesLennart Poettering
2016-04-12core: introduce MANAGER_IS_RELOADING() macroLennart Poettering
This replaces the old function call manager_is_reloading_or_reexecuting() which was used only at very few places. Use the new macro wherever we check whether we are reloading. This should hopefully make things a bit more readable, given the nature of Manager:n_reloading being a counter.
2016-04-12core: remove ManagerRunningAs enumLennart Poettering
Previously, we had two enums ManagerRunningAs and UnitFileScope, that were mostly identical and converted from one to the other all the time. The latter had one more value UNIT_FILE_GLOBAL however. Let's simplify things, and remove ManagerRunningAs and replace it by UnitFileScope everywhere, thus making the translation unnecessary. Introduce two new macros MANAGER_IS_SYSTEM() and MANAGER_IS_USER() to simplify checking if we are running in one or the user context.
2016-04-12core: rework generator dir logic, move the dirs into LookupPaths structureLennart Poettering
A long time ago – when generators where first introduced – the directories for them were randomly created via mkdtemp(). This was changed later so that they use fixed name directories now. Let's make use of this, and add the genrator dirs to the LookupPaths structure and into the unit file search path maintained in it. This has the benefit that the generator dirs are now normal part of the search path for all tools, and thus are shown in "systemctl list-unit-files" too.
2016-03-30core: improve error message when starting template without instanceLukas Nykryn