summaryrefslogtreecommitdiff
path: root/src/core/manager.c
AgeCommit message (Collapse)Author
2017-02-20manager: run environment generatorsZbigniew Jędrzejewski-Szmek
Environment file generators are a lot like unit file generators, but not exactly: 1. environment file generators are run for each manager instance, and their output is (or at least can be) individualized. The generators themselves are system-wide, the same for all users. 2. environment file generators are run sequentially, in priority order. Thus, the lifetime of those files is tied to lifecycle of the manager instance. Because generators are run sequentially, later generators can use or modify the output of earlier generators. Each generator is run with no arguments, and the whole state is stored in the environment variables. The generator can echo a set of variable assignments to standard output: VAR_A=something VAR_B=something else This output is parsed, and the next and subsequent generators run with those updated variables in the environment. After the last generator is done, the environment that the manager itself exports is updated. Each generator must return 0, otherwise the output is ignored. The generators in */user-env-generator are for the user session managers, including root, and the ones in */system-env-generator are for pid1.
2017-02-20core/manager: move environment serialization out to basic/env-util.cZbigniew Jędrzejewski-Szmek
This protocol is generally useful, we might just as well reuse it for the env. generators. The implementation is changed a bit: instead of making a new strv and freeing the old one, just mutate the original. This is much faster with larger arrays, while in fact atomicity is preserved, since we only either insert the new entry or not, without being in inconsistent state. v2: - fix confusion with return value
2017-02-20core/manager: fix grammar in commentZbigniew Jędrzejewski-Szmek
2017-02-20basic/exec-util: add support for synchronous (ordered) executionZbigniew Jędrzejewski-Szmek
The output of processes can be gathered, and passed back to the callee. (This commit just implements the basic functionality and tests.) After the preparation in previous commits, the change in functionality is relatively simple. For coding convenience, alarm is prepared *before* any children are executed, and not before. This shouldn't matter usually, since just forking of the children should be pretty quick. One could also argue that this is more correct, because we will also catch the case when (for whatever reason), forking itself is slow. Three callback functions and three levels of serialization are used: - from individual generator processes to the generator forker - from the forker back to the main process - deserialization in the main process v2: - replace an structure with an indexed array of callbacks
2017-02-20core/manager: split out creation of serialization fd out to a helperZbigniew Jędrzejewski-Szmek
There is a slight change in behaviour: the user manager for root will create a temporary file in /run/systemd, not /tmp. I don't think this matters, but simplifies implementation.
2017-02-11basic/util: move execute_directory() to separate fileZbigniew Jędrzejewski-Szmek
It's a fairly specialized function. Let's make new files for it and the tests.
2017-02-06core: use a memfd for serializationLennart Poettering
If we can, use a memfd for serializing state during a daemon reload or reexec. Fall back to a file in /run/systemd or /tmp only if memfds are not available. See: #5016
2017-02-06manager: refuse reloading/reexecing when /run is overly fullLennart Poettering
Let's add an extra safety check: before entering a reload/reexec, let's verify that there's enough room in /run for it. Fixes: #5016
2016-12-18core: downgrade "Time has been changed" to debug (#4906)Zbigniew Jędrzejewski-Szmek
That message is emitted by every systemd instance on every resume: Dec 06 08:03:38 laptop systemd[1]: Time has been changed Dec 06 08:03:38 laptop systemd[823]: Time has been changed Dec 06 08:03:38 laptop systemd[916]: Time has been changed Dec 07 08:00:32 laptop systemd[1]: Time has been changed Dec 07 08:00:32 laptop systemd[823]: Time has been changed Dec 07 08:00:32 laptop systemd[916]: Time has been changed -- Reboot -- Dec 07 08:02:46 laptop systemd[836]: Time has been changed Dec 07 08:02:46 laptop systemd[1]: Time has been changed Dec 07 08:02:46 laptop systemd[926]: Time has been changed Dec 07 19:48:12 laptop systemd[1]: Time has been changed Dec 07 19:48:12 laptop systemd[836]: Time has been changed Dec 07 19:48:12 laptop systemd[926]: Time has been changed ... Fixes #4896.
2016-12-11pid1,catalog: use a different MESSAGE_ID for user manager startupZbigniew Jędrzejewski-Szmek
This add a new message id for the end of user instance startup. User manager startup is a different beast then the system startup. Their descriptions are completely different too. Let's just separate them. Partially fixes #3351. Also remove "successful" from the description, since we don't know if the startup was successful or not.
2016-12-09tree-wide: replace all readdir cycles with FOREACH_DIRENT{,_ALL} (#4853)Reverend Homer
2016-11-18Merge pull request #4538 from fbuihuu/confirm-spawn-fixesLennart Poettering
Confirm spawn fixes/enhancements
2016-11-17core: add 'c' in confirmation_spawn to resume the boot processFranck Bui
2016-11-17core: allow to redirect confirmation messages to a different consoleFranck Bui
It's rather hard to parse the confirmation messages (enabled with systemd.confirm_spawn=true) amongst the status messages and the kernel ones (if enabled). This patch gives the possibility to the user to redirect the confirmation message to a different virtual console, either by giving its name or its path, so those messages are separated from the other ones and easier to read.
2016-11-17core: prevent the cylon when confirmation_spawn=yes (#2194)Franck Bui
When booting with systemd.confirm_spawn=true, the eye of cylon animation kicks in pretty quickly so user doesn't have any chance to answer the questions which services to start before the confirmation message is screwed by the cylon. This basically breaks the confirm_spawn functionality completely. This patch prevents the cylon animation to kick in when confirmation_spawn=yes. Fixes: #2194
2016-11-16core: GC redundant device jobs from the run queueLennart Poettering
In contrast to all other unit types device units when queued just track external state, they cannot effect state changes on their own. Hence unless a client or other job waits for them there's no reason to keep them in the job queue. This adds a concept of GC'ing jobs of this type as soon as no client or other job waits for them anymore. To ensure this works correctly we need to track which clients actually reference a job (i.e. which ones enqueued it). Unfortunately that's pretty nasty to do for direct connections, as sd_bus_track doesn't work for them. For now, work around this, by simply remembering in a boolean that a job was requested by a direct connection, and reset it when we notice the direct connection is gone. This means the GC logic works fine, except that jobs are not immediately removed when direct connections disconnect. In the longer term, a rework of the bus logic should fix this properly. For now this should be good enough, as GC works for fine all cases except this one, and thus is a clear improvement over the previous behaviour. Fixes: #1921
2016-11-16core: drop n_in_gc_queue field of Manager structureLennart Poettering
We count the units in the GC queue with this, but actually never make use of it, hence drop it.
2016-11-03Merge pull request #4510 from keszybz/tree-wide-cleanupsLennart Poettering
Tree wide cleanups
2016-10-23tree-wide: drop NULL sentinel from strjoinZbigniew Jędrzejewski-Szmek
This makes strjoin and strjoina more similar and avoids the useless final argument. spatch -I . -I ./src -I ./src/basic -I ./src/basic -I ./src/shared -I ./src/shared -I ./src/network -I ./src/locale -I ./src/login -I ./src/journal -I ./src/journal -I ./src/timedate -I ./src/timesync -I ./src/nspawn -I ./src/resolve -I ./src/resolve -I ./src/systemd -I ./src/core -I ./src/core -I ./src/libudev -I ./src/udev -I ./src/udev/net -I ./src/udev -I ./src/libsystemd/sd-bus -I ./src/libsystemd/sd-event -I ./src/libsystemd/sd-login -I ./src/libsystemd/sd-netlink -I ./src/libsystemd/sd-network -I ./src/libsystemd/sd-hwdb -I ./src/libsystemd/sd-device -I ./src/libsystemd/sd-id128 -I ./src/libsystemd-network --sp-file coccinelle/strjoin.cocci --in-place $(git ls-files src/*.c) git grep -e '\bstrjoin\b.*NULL' -l|xargs sed -i -r 's/strjoin\((.*), NULL\)/strjoin(\1)/' This might have missed a few cases (spatch has a really hard time dealing with _cleanup_ macros), but that's no big issue, they can always be fixed later.
2016-10-22tree-wide: use startswith return value to avoid hardcoded offsetZbigniew Jędrzejewski-Szmek
I think it's an antipattern to have to count the number of bytes in the prefix by hand. We should do this automatically to avoid wasting programmer time, and possible errors. I didn't any offsets that were wrong, so this change is mostly to make future development easier.
2016-10-21core: use emergency_action for ctr+alt+del burstLukas Nykryn
Fixes #4306
2016-10-19pid1: downgrade some rlimit warningsZbigniew Jędrzejewski-Szmek
Since we ignore the result anyway, downgrade errors to warning. log_oom() will still emit an error, but that's mostly theoretical, so it is not worth complicating the code to avoid the small inconsistency
2016-10-16tree-wide: use mfree moreZbigniew Jędrzejewski-Szmek
2016-10-07core: add "invocation ID" concept to service managerLennart Poettering
This adds a new invocation ID concept to the service manager. The invocation ID identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is generated each time a unit moves from and inactive to an activating or active state. The primary usecase for this concept is to connect the runtime data PID 1 maintains about a service with the offline data the journal stores about it. Previously we'd use the unit name plus start/stop times, which however is highly racy since the journal will generally process log data after the service already ended. The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel, except that it applies to an individual unit instead of the whole system. The invocation ID is passed to the activated processes as environment variable. It is additionally stored as extended attribute on the cgroup of the unit. The latter is used by journald to automatically retrieve it for each log logged message and attach it to the log entry. The environment variable is very easily accessible, even for unprivileged services. OTOH the extended attribute is only accessible to privileged processes (this is because cgroupfs only supports the "trusted." xattr namespace, not "user."). The environment variable may be altered by services, the extended attribute may not be, hence is the better choice for the journal. Note that reading the invocation ID off the extended attribute from journald is racy, similar to the way reading the unit name for a logging process is. This patch adds APIs to read the invocation ID to sd-id128: sd_id128_get_invocation() may be used in a similar fashion to sd_id128_get_boot(). PID1's own logging is updated to always include the invocation ID when it logs information about a unit. A new bus call GetUnitByInvocationID() is added that allows retrieving a bus path to a unit by its invocation ID. The bus path is built using the invocation ID, thus providing a path for referring to a unit that is valid only for the current runtime cycleof it. Outlook for the future: should the kernel eventually allow passing of cgroup information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we can alter the invocation ID to be generated as hash from that rather than entirely randomly. This way we can derive the invocation race-freely from the messages.
2016-10-07core: only warn on short reads on signal fdZbigniew Jędrzejewski-Szmek
2016-10-07manager: tighten incoming notification message checksLennart Poettering
Let's not accept datagrams with embedded NUL bytes. Previously we'd simply ignore everything after the first NUL byte. But given that sending us that is pretty ugly let's instead complain and refuse. With this change we'll only accept messages that have exactly zero or one NUL bytes at the very end of the datagram.
2016-10-07manager: be stricter with incomining notifications, warn properly about too ↵Lennart Poettering
large ones Let's make the kernel let us know the full, original datagram size of the incoming message. If it's larger than the buffer space provided by us, drop the whole message with a warning. Before this change the kernel would truncate the message for us to the buffer space provided, and we'd not complain about this, and simply process the incomplete message as far as it made sense.
2016-10-07manager: don't ever busy loop when we get a notification message we can't ↵Lennart Poettering
process If the kernel doesn't permit us to dequeue/process an incoming notification datagram message it's still better to stop processing the notification messages altogether than to enter a busy loop where we keep getting notified but can't do a thing about it. With this change, manager_dispatch_notify_fd() behaviour is changed like this: - if an error indicating a spurious wake-up is seen on recvmsg(), ignore it (EAGAIN/EINTR) - if any other error is seen on recvmsg() propagate it, thus disabling processing of further wakeups - if any error is seen on later code in the function, warn about it but do not propagate it, as in this cas we're not going to busy loop as the offending message is already dequeued.
2016-10-06core: add possibility to set action for ctrl-alt-del burst (#4105)Lukáš Nykrýn
For some certification, it should not be possible to reboot the machine through ctrl-alt-delete. Currently we suggest our customers to mask the ctrl-alt-delete target, but that is obviously not enough. Patching the keymaps to disable that is really not a way to go for them, because the settings need to be easily checked by some SCAP tools.
2016-10-01core: do not try to create /run/systemd/transient in test modeZbigniew Jędrzejewski-Szmek
This prevented systemd-analyze from unprivileged operation on older systemd installations, which should be possible. Also, we shouldn't touch the file system in test mode even if we can.
2016-10-01core: update warning messageZbigniew Jędrzejewski-Szmek
"closing all" might suggest that _all_ fds received with the notification message will be closed. Reword the message to clarify that only the "unused" ones will be closed.
2016-10-01core: get rid of unneeded state variableZbigniew Jędrzejewski-Szmek
No functional change.
2016-09-29pid1: more informative error message for ignored notificationsZbigniew Jędrzejewski-Szmek
It's probably easier to diagnose a bad notification message if the contents are printed. But still, do anything only if debugging is on.
2016-09-29pid1: process zero-length notification messages againZbigniew Jędrzejewski-Szmek
This undoes 531ac2b234. I acked that patch without looking at the code carefully enough. There are two problems: - we want to process the fds anyway - in principle empty notification messages are valid, and we should process them as usual, including logging using log_unit_debug().
2016-09-29pid1: don't return any error in manager_dispatch_notify_fd() (#4240)Franck Bui
If manager_dispatch_notify_fd() fails and returns an error then the handling of service notifications will be disabled entirely leading to a compromised system. For example pid1 won't be able to receive the WATCHDOG messages anymore and will kill all services supposed to send such messages.
2016-09-29If the notification message length is 0, ignore the message (#4237)Jorge Niedbalski
Fixes #4234. Signed-off-by: Jorge Niedbalski <jnr@metaklass.org>
2016-09-09pid1: drop kdbus_fd and all associated logicZbigniew Jędrzejewski-Szmek
2016-08-22core: add Ref()/Unref() bus calls for unitsLennart Poettering
This adds two (privileged) bus calls Ref() and Unref() to the Unit interface. The two calls may be used by clients to pin a unit into memory, so that various runtime properties aren't flushed out by the automatic GC. This is necessary to permit clients to race-freely acquire runtime results (such as process exit status/code or accumulated CPU time) on successful service termination. Ref() and Unref() are fully recursive, hence act like the usual reference counting concept in C. Taking a reference is a privileged operation, as this allows pinning units into memory which consumes resources. Transient units may also gain a reference at the time of creation, via the new AddRef property (that is only defined for transient units at the time of creation).
2016-08-19Merge pull request #3965 from htejun/systemd-controller-on-unifiedZbigniew Jędrzejewski-Szmek
2016-08-19core: add RemoveIPC= settingLennart Poettering
This adds the boolean RemoveIPC= setting to service, socket, mount and swap units (i.e. all unit types that may invoke processes). if turned on, and the unit's user/group is not root, all IPC objects of the user/group are removed when the service is shut down. The life-cycle of the IPC objects is hence bound to the unit life-cycle. This is particularly relevant for units with dynamic users, as it is essential that no objects owned by the dynamic users survive the service exiting. In fact, this patch adds code to imply RemoveIPC= if DynamicUser= is set. In order to communicate the UID/GID of an executed process back to PID 1 this adds a new "user lookup" socket pair, that is inherited into the forked processes, and closed before the exec(). This is needed since we cannot do NSS from PID 1 due to deadlock risks, However need to know the used UID/GID in order to clean up IPC owned by it if the unit shuts down.
2016-08-17core: use the unified hierarchy for the systemd cgroup controller hierarchyTejun Heo
Currently, systemd uses either the legacy hierarchies or the unified hierarchy. When the legacy hierarchies are used, systemd uses a named legacy hierarchy mounted on /sys/fs/cgroup/systemd without any kernel controllers for process management. Due to the shortcomings in the legacy hierarchy, this involves a lot of workarounds and complexities. Because the unified hierarchy can be mounted and used in parallel to legacy hierarchies, there's no reason for systemd to use a legacy hierarchy for management even if the kernel resource controllers need to be mounted on legacy hierarchies. It can simply mount the unified hierarchy under /sys/fs/cgroup/systemd and use it without affecting other legacy hierarchies. This disables a significant amount of fragile workaround logics and would allow using features which depend on the unified hierarchy membership such bpf cgroup v2 membership test. In time, this would also allow deleting the said complexities. This patch updates systemd so that it prefers the unified hierarchy for the systemd cgroup controller hierarchy when legacy hierarchies are used for kernel resource controllers. * cg_unified(@controller) is introduced which tests whether the specific controller in on unified hierarchy and used to choose the unified hierarchy code path for process and service management when available. Kernel controller specific operations remain gated by cg_all_unified(). * "systemd.legacy_systemd_cgroup_controller" kernel argument can be used to force the use of legacy hierarchy for systemd cgroup controller. * nspawn: By default nspawn uses the same hierarchies as the host. If UNIFIED_CGROUP_HIERARCHY is set to 1, unified hierarchy is used for all. If 0, legacy for all. * nspawn: arg_unified_cgroup_hierarchy is made an enum and now encodes one of three options - legacy, only systemd controller on unified, and unified. The value is passed into mount setup functions and controls cgroup configuration. * nspawn: Interpretation of SYSTEMD_CGROUP_CONTROLLER to the actual mount option is moved to mount_legacy_cgroup_hierarchy() so that it can take an appropriate action depending on the configuration of the host. v2: - CGroupUnified enum replaces open coded integer values to indicate the cgroup operation mode. - Various style updates. v3: Fixed a bug in detect_unified_cgroup_hierarchy() introduced during v2. v4: Restored legacy container on unified host support and fixed another bug in detect_unified_cgroup_hierarchy().
2016-08-15core: rename cg_unified() to cg_all_unified()Tejun Heo
A following patch will update cgroup handling so that the systemd controller (/sys/fs/cgroup/systemd) can use the unified hierarchy even if the kernel resource controllers are on the legacy hierarchies. This would require distinguishing whether all controllers are on cgroup v2 or only the systemd controller is. In preparation, this patch renames cg_unified() to cg_all_unified(). This patch doesn't cause any functional changes.
2016-08-03core: drop spurious newlineLennart Poettering
2016-07-25Merge pull request #3728 from poettering/dynamic-usersZbigniew Jędrzejewski-Szmek
2016-07-22core: add a concept of "dynamic" user ids, that are allocated as long as a ↵Lennart Poettering
service is running This adds a new boolean setting DynamicUser= to service files. If set, a new user will be allocated dynamically when the unit is started, and released when it is stopped. The user ID is allocated from the range 61184..65519. The user will not be added to /etc/passwd (but an NSS module to be added later should make it show up in getent passwd). For now, care should be taken that the service writes no files to disk, since this might result in files owned by UIDs that might get assigned dynamically to a different service later on. Later patches will tighten sandboxing in order to ensure that this cannot happen, except for a few selected directories. A simple way to test this is: systemd-run -p DynamicUser=1 /bin/sleep 99999
2016-07-22core: change TasksMax= default for system services to 15%Lennart Poettering
As it turns out 512 is max number of tasks per service is hit by too many applications, hence let's bump it a bit, and make it relative to the system's maximum number of PIDs. With this change the new default is 15%. At the kernel's default pids_max value of 32768 this translates to 4915. At machined's default TasksMax= setting of 16384 this translates to 2457. Why 15%? Because it sounds like a round number and is close enough to 4096 which I was going for, i.e. an eight-fold increase over the old 512 Summary: | on the host | in a container old default | 512 | 512 new default | 4915 | 2457
2016-07-21core: remove duplicate includes (#3771)Thomas H. P. Andersen
2016-07-16manager: don't skip sigchld handler for main and control pid for services ↵Lukáš Nykrýn
(#3738) During stop when service has one "regular" pid one main pid and one control pid and the sighld for the regular one is processed first the unit_tidy_watch_pids will skip the main and control pid and does not remove them from u->pids(). But then we skip the sigchld event because we already did one in the iteration and there are two pids in u->pids. v2: Use general unit_main_pid() and unit_control_pid() instead of reaching directly to service structure.
2016-07-01manager: Fixing a debug printf formatting mistake (#3640)Kyle Walker
A 'llu' formatting statement was used in a debugging printf statement instead of a 'PRIu64'. Correcting that mistake here.
2016-06-30manager: Only invoke a single sigchld per unit within a cleanup cycleKyle Walker
By default, each iteration of manager_dispatch_sigchld() results in a unit level sigchld event being invoked. For scope units, this results in a scope_sigchld_event() which can seemingly stall for workloads that have a large number of PIDs within the scope. The stall exhibits itself as a SIG_0 being initiated for each u->pids entry as a result of pid_is_unwaited(). v2: This patch resolves this condition by only paying to cost of a sigchld in the underlying scope unit once per sigchld iteration. A new "sigchldgen" member resides within the Unit struct. The Manager is incremented via the sd event loop, accessed via sd_event_get_iteration, and the Unit member is set to the same value as the manager each time that a sigchld event is invoked. If the Manager iteration value and Unit member match, the sigchld event is not invoked for that iteration.