summaryrefslogtreecommitdiff
path: root/src/core/execute.c
AgeCommit message (Collapse)Author
2017-02-20core/execute: add (void)Zbigniew Jędrzejewski-Szmek
CID #778045.
2017-02-15tree-wide: add SD_ID128_MAKE_STR, remove LOG_MESSAGE_IDZbigniew Jędrzejewski-Szmek
Embedding sd_id128_t's in constant strings was rather cumbersome. We had SD_ID128_CONST_STR which returned a const char[], but it had two problems: - it wasn't possible to statically concatanate this array with a normal string - gcc wasn't really able to optimize this, and generated code to perform the "conversion" at runtime. Because of this, even our own code in coredumpctl wasn't using SD_ID128_CONST_STR. Add a new macro to generate a constant string: SD_ID128_MAKE_STR. It is not as elegant as SD_ID128_CONST_STR, because it requires a repetition of the numbers, but in practice it is more convenient to use, and allows gcc to generate smarter code: $ size .libs/systemd{,-logind,-journald}{.old,} text data bss dec hex filename 1265204 149564 4808 1419576 15a938 .libs/systemd.old 1260268 149564 4808 1414640 1595f0 .libs/systemd 246805 13852 209 260866 3fb02 .libs/systemd-logind.old 240973 13852 209 255034 3e43a .libs/systemd-logind 146839 4984 34 151857 25131 .libs/systemd-journald.old 146391 4984 34 151409 24f71 .libs/systemd-journald It is also much easier to check if a certain binary uses a certain MESSAGE_ID: $ strings .libs/systemd.old|grep MESSAGE_ID MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x $ strings .libs/systemd|grep MESSAGE_ID MESSAGE_ID=c7a787079b354eaaa9e77b371893cd27 MESSAGE_ID=b07a249cd024414a82dd00cd181378ff MESSAGE_ID=641257651c1b4ec9a8624d7a40a9e1e7 MESSAGE_ID=de5b426a63be47a7b6ac3eaac82e2f6f MESSAGE_ID=d34d037fff1847e6ae669a370e694725 MESSAGE_ID=7d4958e842da4a758f6c1cdc7b36dcc5 MESSAGE_ID=1dee0369c7fc4736b7099b38ecb46ee7 MESSAGE_ID=39f53479d3a045ac8e11786248231fbf MESSAGE_ID=be02cf6855d2428ba40df7e9d022f03d MESSAGE_ID=7b05ebc668384222baa8881179cfda54 MESSAGE_ID=9d1aaa27d60140bd96365438aad20286
2017-02-12core: skip ReadOnlyPaths= and other permission-related mounts on ↵Lennart Poettering
PermissionsStartOnly= (#5309) ReadOnlyPaths=, ProtectHome=, InaccessiblePaths= and ProtectSystem= are about restricting access and little more, hence they should be disabled if PermissionsStartOnly= is used or ExecStart= lines are prefixed with a "+". Do that. (Note that we will still create namespaces and stuff, since that's about a lot more than just permissions. We'll simply disable the effect of the four options mentioned above, but nothing else mount related.) This also adds a test for this, to ensure this works as intended. No documentation updates, as the documentation are already vague enough to support the new behaviour ("If true, the permission-related execution options…"). We could clarify this further, but I think we might want to extend the switches' behaviour a bit more in future, hence leave it at this for now. Fixes: #5308
2017-02-09execute: set the right exit status for CHDIR vs. CHROOTLennart Poettering
Fixes: #5125
2017-02-09execute: use prefix_roota() where appropriateLennart Poettering
2017-02-09execute: set working directory to /root if User= is not set, but ↵Lennart Poettering
WorkingDirectory=~ is Or actually, try to to do the right thing depending on what is available: - If we know $HOME from User=, then use that. - If the UID for the service is 0, hardcode that WorkingDirectory=~ means WorkingDirectory=/root - In any other case (which will be the unprivileged --user case), use get_home_dir() to find the $HOME of the user we are running as. - Otherwise fail. Fixes: #5246 #5124
2017-02-09Revert "core/execute: set HOME, USER also for root users"Lennart Poettering
This reverts commit 8b89628a10af3863bfc97872912e9da4076a5929. This broke #5246
2017-02-07core: add RootImage= setting for using a specific image file as root ↵Lennart Poettering
directory for a service This is similar to RootDirectory= but mounts the root file system from a block device or loopback file instead of another directory. This reuses the image dissector code now used by nspawn and gpt-auto-discovery.
2017-02-07core: add a per-unit setting MountAPIVFS= for mounting /dev, /proc, /sys in ↵Lennart Poettering
conjunction with RootDirectory= This adds a boolean unit file setting MountAPIVFS=. If set, the three main API VFS mounts will be mounted for the service. This only has an effect on RootDirectory=, which it makes a ton times more useful. (This is basically the /dev + /proc + /sys mounting code posted in the original #4727, but rebased on current git, and with the automatic logic replaced by explicit logic controlled by a unit file setting)
2017-02-03core/execute: pass the username to utmp/wtmp databaseZbigniew Jędrzejewski-Szmek
Before previous commit, username would be NULL for root, and set only for other users. So the argument passed to utmp_put_init_process() would be "root" for other users and NULL for root. Seems strange. Instead, always pass the username if available.
2017-02-03core/execute: set HOME, USER also for root usersZbigniew Jędrzejewski-Szmek
This changes the environment for services running as root from: LANG=C.utf8 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin INVOCATION_ID=ffbdec203c69499a9b83199333e31555 JOURNAL_STREAM=8:1614518 to LANG=C.utf8 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin HOME=/root LOGNAME=root USER=root SHELL=/bin/sh INVOCATION_ID=15a077963d7b4ca0b82c91dc6519f87c JOURNAL_STREAM=8:1616718 Making the environment special for the root user complicates things unnecessarily. This change simplifies both our logic (by making the setting of the variables unconditional), and should also simplify the logic in services (particularly scripts). Fixes #5124.
2017-01-31core/execute.c: check asprintf return value in the usual fashionZbigniew Jędrzejewski-Szmek
This is unlikely to fail, but we cannot rely on asprintf return value on failure, so let's just be correct here. CID #1368227.
2017-01-31core/execute: reformat exec_context_named_iofds() for legibilityZbigniew Jędrzejewski-Szmek
2017-01-24core/execute: fix strv memleakZbigniew Jędrzejewski-Szmek
compile_read_write_paths() returns a normal strv from strv_copy(), and setup_namespace() uses it read-only, so we should use strv_free to deallocate.
2017-01-17Merge pull request #4991 from poettering/seccomp-fixZbigniew Jędrzejewski-Szmek
2017-01-17pid1: provide a more detailed error message when execution fails (#5074)Zbigniew Jędrzejewski-Szmek
Fixes #5000.
2017-01-17seccomp: rework seccomp code, to improve compat with some archsLennart Poettering
This substantially reworks the seccomp code, to ensure better compatibility with some architectures, including i386. So far we relied on libseccomp's internal handling of the multiple syscall ABIs supported on Linux. This is problematic however, as it does not define clear semantics if an ABI is not able to support specific seccomp rules we install. This rework hence changes a couple of things: - We no longer use seccomp_rule_add(), but only seccomp_rule_add_exact(), and fail the installation of a filter if the architecture doesn't support it. - We no longer rely on adding multiple syscall architectures to a single filter, but instead install a separate filter for each syscall architecture supported. This way, we can install a strict filter for x86-64, while permitting a less strict filter for i386. - All high-level filter additions are now moved from execute.c to seccomp-util.c, so that we can test them independently of the service execution logic. - Tests have been added for all types of our seccomp filters. - SystemCallFilters= and SystemCallArchitectures= are now implemented in independent filters and installation logic, as they semantically are very much independent of each other. Fixes: #4575
2016-12-13Merge pull request #4806 from poettering/keyring-initZbigniew Jędrzejewski-Szmek
set up a per-service session kernel keyring, and store the invocation ID in it
2016-12-14core: add ability to define arbitrary bind mounts for servicesLennart Poettering
This adds two new settings BindPaths= and BindReadOnlyPaths=. They allow defining arbitrary bind mounts specific to particular services. This is particularly useful for services with RootDirectory= set as this permits making specific bits of the host directory available to chrooted services. The two new settings follow the concepts nspawn already possess in --bind= and --bind-ro=, as well as the .nspawn settings Bind= and BindReadOnly= (and these latter options should probably be renamed to BindPaths= and BindReadOnlyPaths= too). Fixes: #3439
2016-12-13core: store the invocation ID in the per-service keyringLennart Poettering
Let's store the invocation ID in the per-service keyring as a root-owned key, with strict access rights. This has the advantage over the environment-based ID passing that it also works from SUID binaries (as they key cannot be overidden by unprivileged code starting them), in contrast to the secure_getenv() based mode. The invocation ID is now passed in three different ways to a service: - As environment variable $INVOCATION_ID. This is easy to use, but may be overriden by unprivileged code (which might be a bad or a good thing), which means it's incompatible with SUID code (see above). - As extended attribute on the service cgroup. This cannot be overriden by unprivileged code, and may be queried safely from "outside" of a service. However, it is incompatible with containers right now, as unprivileged containers generally cannot set xattrs on cgroupfs. - As "invocation_id" key in the kernel keyring. This has the benefit that the key cannot be changed by unprivileged service code, and thus is safe to access from SUID code (see above). But do note that service code can replace the session keyring with a fresh one that lacks the key. However in that case the key will not be owned by root, which is easily detectable. The keyring is also incompatible with containers right now, as it is not properly namespace aware (but this is being worked on), and thus most container managers mask the keyring-related system calls. Ideally we'd only have one way to pass the invocation ID, but the different ways all have limitations. The invocation ID hookup in journald is currently only available on the host but not in containers, due to the mentioned limitations. How to verify the new invocation ID in the keyring: # systemd-run -t /bin/sh Running as unit: run-rd917366c04f847b480d486017f7239d6.service Press ^] three times within 1s to disconnect TTY. # keyctl show Session Keyring 680208392 --alswrv 0 0 keyring: _ses 250926536 ----s-rv 0 0 \_ user: invocation_id # keyctl request user invocation_id 250926536 # keyctl read 250926536 16 bytes of data in key: 9c96317c ac64495a a42b9cd7 4f3ff96b # echo $INVOCATION_ID 9c96317cac64495aa42b9cd74f3ff96b # ^D This creates a new transient service runnint a shell. Then verifies the contents of the keyring, requests the invocation ID key, and reads its payload. For comparison the invocation ID as passed via the environment variable is also displayed.
2016-12-13core: run each system service with a fresh session keyringLennart Poettering
This patch ensures that each system service gets its own session kernel keyring automatically, and implicitly. Without this a keyring is allocated for it on-demand, but is then linked with the user's kernel keyring, which is OK behaviour for logged in users, but not so much for system services. With this change each service gets a session keyring that is specific to the service and ceases to exist when the service is shut down. The session keyring is not linked up with the user keyring and keys hence only search within the session boundaries by default. (This is useful in a later commit to store per-service material in the keyring, for example the invocation ID) (With input from David Howells)
2016-11-18Merge pull request #4538 from fbuihuu/confirm-spawn-fixesLennart Poettering
Confirm spawn fixes/enhancements
2016-11-17core: in confirm spawn, suggest 'f' when user selects 'n' choiceFranck Bui
2016-11-17core: confirm_spawn: always accept units with same_pgrp set for nowFranck Bui
For some reasons units remaining in the same process group as PID 1 (same_pgrp=true) fail to acquire the console even if it's not taken by anyone. So always accept for units with same_pgrp set for now.
2016-11-17core: include the unit name when notifying that a confirmation question ↵Franck Bui
timed out
2016-11-17core: add 'c' in confirmation_spawn to resume the boot processFranck Bui
2016-11-17core: add 'j' in confirmation_spawn to list the jobs that are in progressFranck Bui
2016-11-17core: add 'D' in confirmat spawn to show a full dump of the unit to spawnFranck Bui
2016-11-17core: add 'i' in confirm spawn to give a short summary of the unit to spawnFranck Bui
2016-11-17core: rework the confirmation spawn promptFranck Bui
Previously it was "[Yes, Fail, Skip]" which is pretty misleading because it suggests that the whole word needs to be entered instead of a single char. Also this won't fit well when we'll extend the number of choices. This patch addresses this by changing the choice hint with "[y, f, s – h for help]" so it's now clear that a single letter has to be entered. It also introduces a new choice 'h' which describes all possible choices since a single letter can be not descriptive enough for new users. It also allow to stick with the same hint string regardless of how many choices we will support.
2016-11-17core: limit the length of the confirmation questionFranck Bui
When "confirmation_spawn=1", the confirmation question can look like: Execute /usr/bin/kmod static-nodes --format=tmpfiles --output=/run/tmpfiles.d/kmod.conf? [Yes, No, Skip] which is pretty verbose and might not fit in the console width size (which is usually 80 chars) and thus question will be splitted into 2 consecutive lines. However since the question is now refreshed every 2 secs, the reprinted question will overwrite the second line of the previous one... To prevent this, this patch makes sure that the command line won't be longer than 60 chars by ellipsizing it if the command is longer: Execute /usr/bin/kmod static-nodes --format=tmpfiles --output=/ru…nf? [Yes, No, View, Skip] A following patch will introduce a new choice that will allow the user to get details on the command to be executed so it will still be possible to see the full command line.
2016-11-17core: in confirm_spawn, the meaning of 'n' and 's' choices are confusingFranck Bui
Before this patch we had: - "no" which gives "failing execution" but the command is actually assumed as succeed. - "skip" which gives "skipping", but the command is assumed to have failed, which ends up with "Failed to start ..." on the console. Now we have: - "fail" which gives "failing execution" and the command is indeed assumed as failed. - "skip" which gives "skipping execution" and the command is assumed as succeed.
2016-11-17core: rework ask_for_confirmation()Franck Bui
Now the reponses are handled by ask_for_confirmation() as well as the report of any errors occuring during the process of retrieving the confirmation response. One benefit of this is that there's no need to open/close the console one more time when reporting error/status messages. The caller now just needs to care about the return values whose meanings are: - don't execute and pretend that the command failed - don't execute and pretend that the command succeeed - positive answer, execute the command Also some slight code reorganization and introduce write_confirm_error() and write_confirm_error_fd(). write_confim_message becomes unneeded.
2016-11-17core: allow to redirect confirmation messages to a different consoleFranck Bui
It's rather hard to parse the confirmation messages (enabled with systemd.confirm_spawn=true) amongst the status messages and the kernel ones (if enabled). This patch gives the possibility to the user to redirect the confirmation message to a different virtual console, either by giving its name or its path, so those messages are separated from the other ones and easier to read.
2016-11-15core: improve the logic that implies no new privilegesDjalal Harouni
The no_new_privileged_set variable is not used any more since commit 9b232d3241fcfbf60af that fixed another thing. So remove it. Also no need to check if we are under user manager, remove that part too.
2016-11-08core: on DynamicUser= make sure that protecting sensitive paths is enforced ↵Djalal Harouni
(#4596) This adds a variable that is always set to false to make sure that protect paths inside sandbox are always enforced and not ignored. The only case when it is set to true is on DynamicUser=no and RootDirectory=/chroot is set. This allows users to use more our sandbox features inside RootDirectory= The only exception is ProtectSystem=full|strict and when DynamicUser=yes is implied. Currently RootDirectory= is not fully compatible with these due to two reasons: * /chroot/usr|etc has to be present on ProtectSystem=full * /chroot// has to be a mount point on ProtectSystem=strict.
2016-11-08Merge pull request #4536 from poettering/seccomp-namespacesZbigniew Jędrzejewski-Szmek
core: add new RestrictNamespaces= unit file setting Merging, not rebasing, because this touches many files and there were tree-wide cleanups in the mean time.
2016-11-07Rename formats-util.h to format-util.hZbigniew Jędrzejewski-Szmek
We don't have plural in the name of any other -util files and this inconsistency trips me up every time I try to type this file name from memory. "formats-util" is even hard to pronounce.
2016-11-04core: add new RestrictNamespaces= unit file settingLennart Poettering
This new setting permits restricting whether namespaces may be created and managed by processes started by a unit. It installs a seccomp filter blocking certain invocations of unshare(), clone() and setns(). RestrictNamespaces=no is the default, and does not restrict namespaces in any way. RestrictNamespaces=yes takes away the ability to create or manage any kind of namspace. "RestrictNamespaces=mnt ipc" restricts the creation of namespaces so that only mount and IPC namespaces may be created/managed, but no other kind of namespaces. This setting should be improve security quite a bit as in particular user namespacing was a major source of CVEs in the kernel in the past, and is accessible to unprivileged processes. With this setting the entire attack surface may be removed for system services that do not make use of namespaces.
2016-11-03Merge pull request #4510 from keszybz/tree-wide-cleanupsLennart Poettering
Tree wide cleanups
2016-11-03core: intialize user aux groups and SupplementaryGroups= when DynamicUser= ↵Djalal Harouni
is set Make sure that when DynamicUser= is set that we intialize the user supplementary groups and that we also support SupplementaryGroups= Fixes: https://github.com/systemd/systemd/issues/4539 Thanks Evgeny Vereshchagin (@evverx)
2016-11-02Merge pull request #4483 from poettering/exec-orderLennart Poettering
more seccomp fixes, and change of order of selinux/aa/smack and seccomp application on exec
2016-11-02core: initialize groups list before checking SupplementaryGroups= of a unit ↵Djalal Harouni
(#4533) Always initialize the supplementary groups of caller before checking the unit SupplementaryGroups= option. Fixes https://github.com/systemd/systemd/issues/4531
2016-11-02execute: apply seccomp filters after changing selinux/aa/smack contextsLennart Poettering
Seccomp is generally an unprivileged operation, changing security contexts is most likely associated with some form of policy. Moreover, while seccomp may influence our own flow of code quite a bit (much more than the security context change) make sure to apply the seccomp filters immediately before executing the binary to invoke. This also moves enforcement of NNP after the security context change, so that NNP cannot affect it anymore. (However, the security policy now has to permit the NNP change). This change has a good chance of breaking current SELinux/AA/SMACK setups, because the policy might not expect this change of behaviour. However, it's technically the better choice I think and should hence be applied. Fixes: #3993
2016-10-28Merge pull request #4495 from topimiettinen/block-shmat-execDjalal Harouni
seccomp: also block shmat(..., SHM_EXEC) for MemoryDenyWriteExecute
2016-10-27core: make unit argument const for apply seccomp functionsDjalal Harouni
2016-10-27core: lets apply working directory just after mount namespacesDjalal Harouni
This makes applying groups after applying the working directory, this may allow some flexibility but at same it is not a big deal since we don't execute or do anything between applying working directory and droping groups.
2016-10-27core: get the working directory value inside apply_working_directory()Djalal Harouni
Improve apply_working_directory() and lets get the current working directory inside of it.
2016-10-27core: move apply working directory code into its own apply_working_directory()Djalal Harouni
2016-10-27core: move the code that setups namespaces on its own functionDjalal Harouni