summaryrefslogtreecommitdiff
path: root/src/core/manager.c
AgeCommit message (Collapse)Author
2017-02-23Merge pull request #4670 from htejun/systemd-controller-on-unified-v2Lennart Poettering
Systemd controller on unified v2
2017-02-21Merge pull request #5131 from keszybz/environment-generatorsLennart Poettering
Environment generators
2017-02-20manager: run environment generatorsZbigniew Jędrzejewski-Szmek
Environment file generators are a lot like unit file generators, but not exactly: 1. environment file generators are run for each manager instance, and their output is (or at least can be) individualized. The generators themselves are system-wide, the same for all users. 2. environment file generators are run sequentially, in priority order. Thus, the lifetime of those files is tied to lifecycle of the manager instance. Because generators are run sequentially, later generators can use or modify the output of earlier generators. Each generator is run with no arguments, and the whole state is stored in the environment variables. The generator can echo a set of variable assignments to standard output: VAR_A=something VAR_B=something else This output is parsed, and the next and subsequent generators run with those updated variables in the environment. After the last generator is done, the environment that the manager itself exports is updated. Each generator must return 0, otherwise the output is ignored. The generators in */user-env-generator are for the user session managers, including root, and the ones in */system-env-generator are for pid1.
2017-02-20core/manager: move environment serialization out to basic/env-util.cZbigniew Jędrzejewski-Szmek
This protocol is generally useful, we might just as well reuse it for the env. generators. The implementation is changed a bit: instead of making a new strv and freeing the old one, just mutate the original. This is much faster with larger arrays, while in fact atomicity is preserved, since we only either insert the new entry or not, without being in inconsistent state. v2: - fix confusion with return value
2017-02-20core/manager: fix grammar in commentZbigniew Jędrzejewski-Szmek
2017-02-20basic/exec-util: add support for synchronous (ordered) executionZbigniew Jędrzejewski-Szmek
The output of processes can be gathered, and passed back to the callee. (This commit just implements the basic functionality and tests.) After the preparation in previous commits, the change in functionality is relatively simple. For coding convenience, alarm is prepared *before* any children are executed, and not before. This shouldn't matter usually, since just forking of the children should be pretty quick. One could also argue that this is more correct, because we will also catch the case when (for whatever reason), forking itself is slow. Three callback functions and three levels of serialization are used: - from individual generator processes to the generator forker - from the forker back to the main process - deserialization in the main process v2: - replace an structure with an indexed array of callbacks
2017-02-20core/manager: split out creation of serialization fd out to a helperZbigniew Jędrzejewski-Szmek
There is a slight change in behaviour: the user manager for root will create a temporary file in /run/systemd, not /tmp. I don't think this matters, but simplifies implementation.
2017-02-18core: simplify cg_[all_]unified()Tejun Heo
cg_[all_]unified() test whether a specific controller or all controllers are on the unified hierarchy. While what's being asked is a simple binary question, the callers must assume that the functions may fail any time, which unnecessarily complicates their usages. This complication is unnecessary. Internally, the test result is cached anyway and there are only a few places where the test actually needs to be performed. This patch simplifies cg_[all_]unified(). * cg_[all_]unified() are updated to return bool. If the result can't be decided, assertion failure is triggered. Error handlings from their callers are dropped. * cg_unified_flush() is updated to calculate the new result synchrnously and return whether it succeeded or not. Places which need to flush the test result are updated to test for failure. This ensures that all the following cg_[all_]unified() tests succeed. * Places which expected possible cg_[all_]unified() failures are updated to call and test cg_unified_flush() before calling cg_[all_]unified(). This includes functions used while setting up mounts during boot and manager_setup_cgroup().
2017-02-15tree-wide: add SD_ID128_MAKE_STR, remove LOG_MESSAGE_IDZbigniew Jędrzejewski-Szmek
Embedding sd_id128_t's in constant strings was rather cumbersome. We had SD_ID128_CONST_STR which returned a const char[], but it had two problems: - it wasn't possible to statically concatanate this array with a normal string - gcc wasn't really able to optimize this, and generated code to perform the "conversion" at runtime. Because of this, even our own code in coredumpctl wasn't using SD_ID128_CONST_STR. Add a new macro to generate a constant string: SD_ID128_MAKE_STR. It is not as elegant as SD_ID128_CONST_STR, because it requires a repetition of the numbers, but in practice it is more convenient to use, and allows gcc to generate smarter code: $ size .libs/systemd{,-logind,-journald}{.old,} text data bss dec hex filename 1265204 149564 4808 1419576 15a938 .libs/systemd.old 1260268 149564 4808 1414640 1595f0 .libs/systemd 246805 13852 209 260866 3fb02 .libs/systemd-logind.old 240973 13852 209 255034 3e43a .libs/systemd-logind 146839 4984 34 151857 25131 .libs/systemd-journald.old 146391 4984 34 151409 24f71 .libs/systemd-journald It is also much easier to check if a certain binary uses a certain MESSAGE_ID: $ strings .libs/systemd.old|grep MESSAGE_ID MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x $ strings .libs/systemd|grep MESSAGE_ID MESSAGE_ID=c7a787079b354eaaa9e77b371893cd27 MESSAGE_ID=b07a249cd024414a82dd00cd181378ff MESSAGE_ID=641257651c1b4ec9a8624d7a40a9e1e7 MESSAGE_ID=de5b426a63be47a7b6ac3eaac82e2f6f MESSAGE_ID=d34d037fff1847e6ae669a370e694725 MESSAGE_ID=7d4958e842da4a758f6c1cdc7b36dcc5 MESSAGE_ID=1dee0369c7fc4736b7099b38ecb46ee7 MESSAGE_ID=39f53479d3a045ac8e11786248231fbf MESSAGE_ID=be02cf6855d2428ba40df7e9d022f03d MESSAGE_ID=7b05ebc668384222baa8881179cfda54 MESSAGE_ID=9d1aaa27d60140bd96365438aad20286
2017-02-12core/manager: silence gcc warning about unitialized variableZbigniew Jędrzejewski-Szmek
At -O3, this was printed a hundred times for various callers of manager_add_job_by_name(). AFAICT, there is no error and `unit` is always intialized. Nevertheless, add explicit initialization to silence the noise. src/core/manager.c: In function 'manager_start_target': src/core/manager.c:1413:16: warning: 'unit' may be used uninitialized in this function [-Wmaybe-uninitialized] return manager_add_job(m, type, unit, mode, e, ret); ^ src/core/manager.c:1401:15: note: 'unit' was declared here Unit *unit; ^
2017-02-12core/manager: make manager_load_unit*() functions always take output argZbigniew Jędrzejewski-Szmek
We were inconsistent, manager_load_unit_prepare() would crash if _ret was ever NULL. But none of the callers use NULL. So simplify things and require it to be non-NULL.
2017-02-11basic/util: move execute_directory() to separate fileZbigniew Jędrzejewski-Szmek
It's a fairly specialized function. Let's make new files for it and the tests.
2017-02-06core: use a memfd for serializationLennart Poettering
If we can, use a memfd for serializing state during a daemon reload or reexec. Fall back to a file in /run/systemd or /tmp only if memfds are not available. See: #5016
2017-02-06manager: refuse reloading/reexecing when /run is overly fullLennart Poettering
Let's add an extra safety check: before entering a reload/reexec, let's verify that there's enough room in /run for it. Fixes: #5016
2016-12-18core: downgrade "Time has been changed" to debug (#4906)Zbigniew Jędrzejewski-Szmek
That message is emitted by every systemd instance on every resume: Dec 06 08:03:38 laptop systemd[1]: Time has been changed Dec 06 08:03:38 laptop systemd[823]: Time has been changed Dec 06 08:03:38 laptop systemd[916]: Time has been changed Dec 07 08:00:32 laptop systemd[1]: Time has been changed Dec 07 08:00:32 laptop systemd[823]: Time has been changed Dec 07 08:00:32 laptop systemd[916]: Time has been changed -- Reboot -- Dec 07 08:02:46 laptop systemd[836]: Time has been changed Dec 07 08:02:46 laptop systemd[1]: Time has been changed Dec 07 08:02:46 laptop systemd[926]: Time has been changed Dec 07 19:48:12 laptop systemd[1]: Time has been changed Dec 07 19:48:12 laptop systemd[836]: Time has been changed Dec 07 19:48:12 laptop systemd[926]: Time has been changed ... Fixes #4896.
2016-12-11pid1,catalog: use a different MESSAGE_ID for user manager startupZbigniew Jędrzejewski-Szmek
This add a new message id for the end of user instance startup. User manager startup is a different beast then the system startup. Their descriptions are completely different too. Let's just separate them. Partially fixes #3351. Also remove "successful" from the description, since we don't know if the startup was successful or not.
2016-12-09tree-wide: replace all readdir cycles with FOREACH_DIRENT{,_ALL} (#4853)Reverend Homer
2016-11-18Merge pull request #4538 from fbuihuu/confirm-spawn-fixesLennart Poettering
Confirm spawn fixes/enhancements
2016-11-17core: add 'c' in confirmation_spawn to resume the boot processFranck Bui
2016-11-17core: allow to redirect confirmation messages to a different consoleFranck Bui
It's rather hard to parse the confirmation messages (enabled with systemd.confirm_spawn=true) amongst the status messages and the kernel ones (if enabled). This patch gives the possibility to the user to redirect the confirmation message to a different virtual console, either by giving its name or its path, so those messages are separated from the other ones and easier to read.
2016-11-17core: prevent the cylon when confirmation_spawn=yes (#2194)Franck Bui
When booting with systemd.confirm_spawn=true, the eye of cylon animation kicks in pretty quickly so user doesn't have any chance to answer the questions which services to start before the confirmation message is screwed by the cylon. This basically breaks the confirm_spawn functionality completely. This patch prevents the cylon animation to kick in when confirmation_spawn=yes. Fixes: #2194
2016-11-16core: GC redundant device jobs from the run queueLennart Poettering
In contrast to all other unit types device units when queued just track external state, they cannot effect state changes on their own. Hence unless a client or other job waits for them there's no reason to keep them in the job queue. This adds a concept of GC'ing jobs of this type as soon as no client or other job waits for them anymore. To ensure this works correctly we need to track which clients actually reference a job (i.e. which ones enqueued it). Unfortunately that's pretty nasty to do for direct connections, as sd_bus_track doesn't work for them. For now, work around this, by simply remembering in a boolean that a job was requested by a direct connection, and reset it when we notice the direct connection is gone. This means the GC logic works fine, except that jobs are not immediately removed when direct connections disconnect. In the longer term, a rework of the bus logic should fix this properly. For now this should be good enough, as GC works for fine all cases except this one, and thus is a clear improvement over the previous behaviour. Fixes: #1921
2016-11-16core: drop n_in_gc_queue field of Manager structureLennart Poettering
We count the units in the GC queue with this, but actually never make use of it, hence drop it.
2016-11-03Merge pull request #4510 from keszybz/tree-wide-cleanupsLennart Poettering
Tree wide cleanups
2016-10-23tree-wide: drop NULL sentinel from strjoinZbigniew Jędrzejewski-Szmek
This makes strjoin and strjoina more similar and avoids the useless final argument. spatch -I . -I ./src -I ./src/basic -I ./src/basic -I ./src/shared -I ./src/shared -I ./src/network -I ./src/locale -I ./src/login -I ./src/journal -I ./src/journal -I ./src/timedate -I ./src/timesync -I ./src/nspawn -I ./src/resolve -I ./src/resolve -I ./src/systemd -I ./src/core -I ./src/core -I ./src/libudev -I ./src/udev -I ./src/udev/net -I ./src/udev -I ./src/libsystemd/sd-bus -I ./src/libsystemd/sd-event -I ./src/libsystemd/sd-login -I ./src/libsystemd/sd-netlink -I ./src/libsystemd/sd-network -I ./src/libsystemd/sd-hwdb -I ./src/libsystemd/sd-device -I ./src/libsystemd/sd-id128 -I ./src/libsystemd-network --sp-file coccinelle/strjoin.cocci --in-place $(git ls-files src/*.c) git grep -e '\bstrjoin\b.*NULL' -l|xargs sed -i -r 's/strjoin\((.*), NULL\)/strjoin(\1)/' This might have missed a few cases (spatch has a really hard time dealing with _cleanup_ macros), but that's no big issue, they can always be fixed later.
2016-10-22tree-wide: use startswith return value to avoid hardcoded offsetZbigniew Jędrzejewski-Szmek
I think it's an antipattern to have to count the number of bytes in the prefix by hand. We should do this automatically to avoid wasting programmer time, and possible errors. I didn't any offsets that were wrong, so this change is mostly to make future development easier.
2016-10-21core: use emergency_action for ctr+alt+del burstLukas Nykryn
Fixes #4306
2016-10-19pid1: downgrade some rlimit warningsZbigniew Jędrzejewski-Szmek
Since we ignore the result anyway, downgrade errors to warning. log_oom() will still emit an error, but that's mostly theoretical, so it is not worth complicating the code to avoid the small inconsistency
2016-10-16tree-wide: use mfree moreZbigniew Jędrzejewski-Szmek
2016-10-07core: add "invocation ID" concept to service managerLennart Poettering
This adds a new invocation ID concept to the service manager. The invocation ID identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is generated each time a unit moves from and inactive to an activating or active state. The primary usecase for this concept is to connect the runtime data PID 1 maintains about a service with the offline data the journal stores about it. Previously we'd use the unit name plus start/stop times, which however is highly racy since the journal will generally process log data after the service already ended. The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel, except that it applies to an individual unit instead of the whole system. The invocation ID is passed to the activated processes as environment variable. It is additionally stored as extended attribute on the cgroup of the unit. The latter is used by journald to automatically retrieve it for each log logged message and attach it to the log entry. The environment variable is very easily accessible, even for unprivileged services. OTOH the extended attribute is only accessible to privileged processes (this is because cgroupfs only supports the "trusted." xattr namespace, not "user."). The environment variable may be altered by services, the extended attribute may not be, hence is the better choice for the journal. Note that reading the invocation ID off the extended attribute from journald is racy, similar to the way reading the unit name for a logging process is. This patch adds APIs to read the invocation ID to sd-id128: sd_id128_get_invocation() may be used in a similar fashion to sd_id128_get_boot(). PID1's own logging is updated to always include the invocation ID when it logs information about a unit. A new bus call GetUnitByInvocationID() is added that allows retrieving a bus path to a unit by its invocation ID. The bus path is built using the invocation ID, thus providing a path for referring to a unit that is valid only for the current runtime cycleof it. Outlook for the future: should the kernel eventually allow passing of cgroup information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we can alter the invocation ID to be generated as hash from that rather than entirely randomly. This way we can derive the invocation race-freely from the messages.
2016-10-07core: only warn on short reads on signal fdZbigniew Jędrzejewski-Szmek
2016-10-07manager: tighten incoming notification message checksLennart Poettering
Let's not accept datagrams with embedded NUL bytes. Previously we'd simply ignore everything after the first NUL byte. But given that sending us that is pretty ugly let's instead complain and refuse. With this change we'll only accept messages that have exactly zero or one NUL bytes at the very end of the datagram.
2016-10-07manager: be stricter with incomining notifications, warn properly about too ↵Lennart Poettering
large ones Let's make the kernel let us know the full, original datagram size of the incoming message. If it's larger than the buffer space provided by us, drop the whole message with a warning. Before this change the kernel would truncate the message for us to the buffer space provided, and we'd not complain about this, and simply process the incomplete message as far as it made sense.
2016-10-07manager: don't ever busy loop when we get a notification message we can't ↵Lennart Poettering
process If the kernel doesn't permit us to dequeue/process an incoming notification datagram message it's still better to stop processing the notification messages altogether than to enter a busy loop where we keep getting notified but can't do a thing about it. With this change, manager_dispatch_notify_fd() behaviour is changed like this: - if an error indicating a spurious wake-up is seen on recvmsg(), ignore it (EAGAIN/EINTR) - if any other error is seen on recvmsg() propagate it, thus disabling processing of further wakeups - if any error is seen on later code in the function, warn about it but do not propagate it, as in this cas we're not going to busy loop as the offending message is already dequeued.
2016-10-06core: add possibility to set action for ctrl-alt-del burst (#4105)Lukáš Nykrýn
For some certification, it should not be possible to reboot the machine through ctrl-alt-delete. Currently we suggest our customers to mask the ctrl-alt-delete target, but that is obviously not enough. Patching the keymaps to disable that is really not a way to go for them, because the settings need to be easily checked by some SCAP tools.
2016-10-01core: do not try to create /run/systemd/transient in test modeZbigniew Jędrzejewski-Szmek
This prevented systemd-analyze from unprivileged operation on older systemd installations, which should be possible. Also, we shouldn't touch the file system in test mode even if we can.
2016-10-01core: update warning messageZbigniew Jędrzejewski-Szmek
"closing all" might suggest that _all_ fds received with the notification message will be closed. Reword the message to clarify that only the "unused" ones will be closed.
2016-10-01core: get rid of unneeded state variableZbigniew Jędrzejewski-Szmek
No functional change.
2016-09-29pid1: more informative error message for ignored notificationsZbigniew Jędrzejewski-Szmek
It's probably easier to diagnose a bad notification message if the contents are printed. But still, do anything only if debugging is on.
2016-09-29pid1: process zero-length notification messages againZbigniew Jędrzejewski-Szmek
This undoes 531ac2b234. I acked that patch without looking at the code carefully enough. There are two problems: - we want to process the fds anyway - in principle empty notification messages are valid, and we should process them as usual, including logging using log_unit_debug().
2016-09-29pid1: don't return any error in manager_dispatch_notify_fd() (#4240)Franck Bui
If manager_dispatch_notify_fd() fails and returns an error then the handling of service notifications will be disabled entirely leading to a compromised system. For example pid1 won't be able to receive the WATCHDOG messages anymore and will kill all services supposed to send such messages.
2016-09-29If the notification message length is 0, ignore the message (#4237)Jorge Niedbalski
Fixes #4234. Signed-off-by: Jorge Niedbalski <jnr@metaklass.org>
2016-09-09pid1: drop kdbus_fd and all associated logicZbigniew Jędrzejewski-Szmek
2016-08-22core: add Ref()/Unref() bus calls for unitsLennart Poettering
This adds two (privileged) bus calls Ref() and Unref() to the Unit interface. The two calls may be used by clients to pin a unit into memory, so that various runtime properties aren't flushed out by the automatic GC. This is necessary to permit clients to race-freely acquire runtime results (such as process exit status/code or accumulated CPU time) on successful service termination. Ref() and Unref() are fully recursive, hence act like the usual reference counting concept in C. Taking a reference is a privileged operation, as this allows pinning units into memory which consumes resources. Transient units may also gain a reference at the time of creation, via the new AddRef property (that is only defined for transient units at the time of creation).
2016-08-19Merge pull request #3965 from htejun/systemd-controller-on-unifiedZbigniew Jędrzejewski-Szmek
2016-08-19core: add RemoveIPC= settingLennart Poettering
This adds the boolean RemoveIPC= setting to service, socket, mount and swap units (i.e. all unit types that may invoke processes). if turned on, and the unit's user/group is not root, all IPC objects of the user/group are removed when the service is shut down. The life-cycle of the IPC objects is hence bound to the unit life-cycle. This is particularly relevant for units with dynamic users, as it is essential that no objects owned by the dynamic users survive the service exiting. In fact, this patch adds code to imply RemoveIPC= if DynamicUser= is set. In order to communicate the UID/GID of an executed process back to PID 1 this adds a new "user lookup" socket pair, that is inherited into the forked processes, and closed before the exec(). This is needed since we cannot do NSS from PID 1 due to deadlock risks, However need to know the used UID/GID in order to clean up IPC owned by it if the unit shuts down.
2016-08-17core: use the unified hierarchy for the systemd cgroup controller hierarchyTejun Heo
Currently, systemd uses either the legacy hierarchies or the unified hierarchy. When the legacy hierarchies are used, systemd uses a named legacy hierarchy mounted on /sys/fs/cgroup/systemd without any kernel controllers for process management. Due to the shortcomings in the legacy hierarchy, this involves a lot of workarounds and complexities. Because the unified hierarchy can be mounted and used in parallel to legacy hierarchies, there's no reason for systemd to use a legacy hierarchy for management even if the kernel resource controllers need to be mounted on legacy hierarchies. It can simply mount the unified hierarchy under /sys/fs/cgroup/systemd and use it without affecting other legacy hierarchies. This disables a significant amount of fragile workaround logics and would allow using features which depend on the unified hierarchy membership such bpf cgroup v2 membership test. In time, this would also allow deleting the said complexities. This patch updates systemd so that it prefers the unified hierarchy for the systemd cgroup controller hierarchy when legacy hierarchies are used for kernel resource controllers. * cg_unified(@controller) is introduced which tests whether the specific controller in on unified hierarchy and used to choose the unified hierarchy code path for process and service management when available. Kernel controller specific operations remain gated by cg_all_unified(). * "systemd.legacy_systemd_cgroup_controller" kernel argument can be used to force the use of legacy hierarchy for systemd cgroup controller. * nspawn: By default nspawn uses the same hierarchies as the host. If UNIFIED_CGROUP_HIERARCHY is set to 1, unified hierarchy is used for all. If 0, legacy for all. * nspawn: arg_unified_cgroup_hierarchy is made an enum and now encodes one of three options - legacy, only systemd controller on unified, and unified. The value is passed into mount setup functions and controls cgroup configuration. * nspawn: Interpretation of SYSTEMD_CGROUP_CONTROLLER to the actual mount option is moved to mount_legacy_cgroup_hierarchy() so that it can take an appropriate action depending on the configuration of the host. v2: - CGroupUnified enum replaces open coded integer values to indicate the cgroup operation mode. - Various style updates. v3: Fixed a bug in detect_unified_cgroup_hierarchy() introduced during v2. v4: Restored legacy container on unified host support and fixed another bug in detect_unified_cgroup_hierarchy().
2016-08-15core: rename cg_unified() to cg_all_unified()Tejun Heo
A following patch will update cgroup handling so that the systemd controller (/sys/fs/cgroup/systemd) can use the unified hierarchy even if the kernel resource controllers are on the legacy hierarchies. This would require distinguishing whether all controllers are on cgroup v2 or only the systemd controller is. In preparation, this patch renames cg_unified() to cg_all_unified(). This patch doesn't cause any functional changes.
2016-08-03core: drop spurious newlineLennart Poettering
2016-07-25Merge pull request #3728 from poettering/dynamic-usersZbigniew Jędrzejewski-Szmek