summaryrefslogtreecommitdiff
path: root/src/core/execute.h
AgeCommit message (Collapse)Author
2017-02-07core: add RootImage= setting for using a specific image file as root ↵Lennart Poettering
directory for a service This is similar to RootDirectory= but mounts the root file system from a block device or loopback file instead of another directory. This reuses the image dissector code now used by nspawn and gpt-auto-discovery.
2017-02-07core: add a per-unit setting MountAPIVFS= for mounting /dev, /proc, /sys in ↵Lennart Poettering
conjunction with RootDirectory= This adds a boolean unit file setting MountAPIVFS=. If set, the three main API VFS mounts will be mounted for the service. This only has an effect on RootDirectory=, which it makes a ton times more useful. (This is basically the /dev + /proc + /sys mounting code posted in the original #4727, but rebased on current git, and with the automatic logic replaced by explicit logic controlled by a unit file setting)
2016-12-13Merge pull request #4806 from poettering/keyring-initZbigniew Jędrzejewski-Szmek
set up a per-service session kernel keyring, and store the invocation ID in it
2016-12-14core: add ability to define arbitrary bind mounts for servicesLennart Poettering
This adds two new settings BindPaths= and BindReadOnlyPaths=. They allow defining arbitrary bind mounts specific to particular services. This is particularly useful for services with RootDirectory= set as this permits making specific bits of the host directory available to chrooted services. The two new settings follow the concepts nspawn already possess in --bind= and --bind-ro=, as well as the .nspawn settings Bind= and BindReadOnly= (and these latter options should probably be renamed to BindPaths= and BindReadOnlyPaths= too). Fixes: #3439
2016-12-13core: run each system service with a fresh session keyringLennart Poettering
This patch ensures that each system service gets its own session kernel keyring automatically, and implicitly. Without this a keyring is allocated for it on-demand, but is then linked with the user's kernel keyring, which is OK behaviour for logged in users, but not so much for system services. With this change each service gets a session keyring that is specific to the service and ceases to exist when the service is shut down. The session keyring is not linked up with the user keyring and keys hence only search within the session boundaries by default. (This is useful in a later commit to store per-service material in the keyring, for example the invocation ID) (With input from David Howells)
2016-11-18Merge pull request #4538 from fbuihuu/confirm-spawn-fixesLennart Poettering
Confirm spawn fixes/enhancements
2016-11-17core: allow to redirect confirmation messages to a different consoleFranck Bui
It's rather hard to parse the confirmation messages (enabled with systemd.confirm_spawn=true) amongst the status messages and the kernel ones (if enabled). This patch gives the possibility to the user to redirect the confirmation message to a different virtual console, either by giving its name or its path, so those messages are separated from the other ones and easier to read.
2016-11-15core: improve the logic that implies no new privilegesDjalal Harouni
The no_new_privileged_set variable is not used any more since commit 9b232d3241fcfbf60af that fixed another thing. So remove it. Also no need to check if we are under user manager, remove that part too.
2016-11-04core: add new RestrictNamespaces= unit file settingLennart Poettering
This new setting permits restricting whether namespaces may be created and managed by processes started by a unit. It installs a seccomp filter blocking certain invocations of unshare(), clone() and setns(). RestrictNamespaces=no is the default, and does not restrict namespaces in any way. RestrictNamespaces=yes takes away the ability to create or manage any kind of namspace. "RestrictNamespaces=mnt ipc" restricts the creation of namespaces so that only mount and IPC namespaces may be created/managed, but no other kind of namespaces. This setting should be improve security quite a bit as in particular user namespacing was a major source of CVEs in the kernel in the past, and is accessible to unprivileged processes. With this setting the entire attack surface may be removed for system services that do not make use of namespaces.
2016-10-17core/exec: add a named-descriptor option ("fd") for streams (#4179)Luca Bruno
This commit adds a `fd` option to `StandardInput=`, `StandardOutput=` and `StandardError=` properties in order to connect standard streams to externally named descriptors provided by some socket units. This option looks for a file descriptor named as the corresponding stream. Custom names can be specified, separated by a colon. If multiple name-matches exist, the first matching fd will be used.
2016-10-12core:sandbox: Add ProtectKernelModules= optionDjalal Harouni
This is useful to turn off explicit module load and unload operations on modular kernels. This option removes CAP_SYS_MODULE from the capability bounding set for the unit, and installs a system call filter to block module system calls. This option will not prevent the kernel from loading modules using the module auto-load feature which is a system wide operation.
2016-09-25core: add two new service settings ProtectKernelTunables= and ↵Lennart Poettering
ProtectControlGroups= If enabled, these will block write access to /sys, /proc/sys and /proc/sys/fs/cgroup.
2016-08-19core: add RemoveIPC= settingLennart Poettering
This adds the boolean RemoveIPC= setting to service, socket, mount and swap units (i.e. all unit types that may invoke processes). if turned on, and the unit's user/group is not root, all IPC objects of the user/group are removed when the service is shut down. The life-cycle of the IPC objects is hence bound to the unit life-cycle. This is particularly relevant for units with dynamic users, as it is essential that no objects owned by the dynamic users survive the service exiting. In fact, this patch adds code to imply RemoveIPC= if DynamicUser= is set. In order to communicate the UID/GID of an executed process back to PID 1 this adds a new "user lookup" socket pair, that is inherited into the forked processes, and closed before the exec(). This is needed since we cannot do NSS from PID 1 due to deadlock risks, However need to know the used UID/GID in order to clean up IPC owned by it if the unit shuts down.
2016-08-06Merge pull request #3884 from poettering/private-usersZbigniew Jędrzejewski-Szmek
2016-08-04core: only set the watchdog variables in ExecStart= linesLennart Poettering
2016-08-04core: set $SERVICE_RESULT, $EXIT_CODE and $EXIT_STATUS in ↵Lennart Poettering
ExecStop=/ExecStopPost= commands This should simplify monitoring tools for services, by passing the most basic information about service result/exit information via environment variables, thus making it unnecessary to retrieve them explicitly via the bus.
2016-08-04core: move masking of chroot/permission masking into service_spawn()Lennart Poettering
Let's fix up the flags fields in service_spawn() rather than its callers, in order to simplify things a bit.
2016-08-04core: turn various execution flags into a proper flags parameterLennart Poettering
The ExecParameters structure contains a number of bit-flags, that were so far exposed as bool:1, change this to a proper, single binary bit flag field. This makes things a bit more expressive, and is helpful as we add more flags, since these booleans are passed around in various callers, for example service_spawn(), whose signature can be made much shorter now. Not all bit booleans from ExecParameters are moved into the flags field for now, but this can be added later.
2016-08-03core: add new PrivateUsers= option to service executionLennart Poettering
This setting adds minimal user namespacing support to a service. When set the invoked processes will run in their own user namespace. Only a trivial mapping will be set up: the root user/group is mapped to root, and the user/group of the service will be mapped to itself, everything else is mapped to nobody. If this setting is used the service runs with no capabilities on the host, but configurable capabilities within the service. This setting is particularly useful in conjunction with RootDirectory= as the need to synchronize /etc/passwd and /etc/group between the host and the service OS tree is reduced, as only three UID/GIDs need to match: root, nobody and the user of the service itself. But even outside the RootDirectory= case this setting is useful to substantially reduce the attack surface of a service. Example command to test this: systemd-run -p PrivateUsers=1 -p User=foobar -t /bin/sh This runs a shell as user "foobar". When typing "ps" only processes owned by "root", by "foobar", and by "nobody" should be visible.
2016-07-22core: add a concept of "dynamic" user ids, that are allocated as long as a ↵Lennart Poettering
service is running This adds a new boolean setting DynamicUser= to service files. If set, a new user will be allocated dynamically when the unit is started, and released when it is stopped. The user ID is allocated from the range 61184..65519. The user will not be added to /etc/passwd (but an NSS module to be added later should make it show up in getent passwd). For now, care should be taken that the service writes no files to disk, since this might result in files owned by UIDs that might get assigned dynamically to a different service later on. Later patches will tighten sandboxing in order to ensure that this cannot happen, except for a few selected directories. A simple way to test this is: systemd-run -p DynamicUser=1 /bin/sleep 99999
2016-07-20core: normalize header inclusion in execute.h a bitLennart Poettering
We don't actually need any functionality from cgroup.h in execute.h, hence don't include that. However, we do need the Unit structure from unit.h, hence include that, and move it as late as possible, since it needs the definitions from execute.h.
2016-07-19doc,core: Read{Write,Only}Paths= and InaccessiblePaths=Alessandro Puccetti
This patch renames Read{Write,Only}Directories= and InaccessibleDirectories= to Read{Write,Only}Paths= and InaccessiblePaths=, previous names are kept as aliases but they are not advertised in the documentation. Renamed variables: `read_write_dirs` --> `read_write_paths` `read_only_dirs` --> `read_only_paths` `inaccessible_dirs` --> `inaccessible_paths`
2016-07-11treewide: fix typos and remove accidental repetition of wordsTorstein Husebø
2016-06-23execute: add a new easy-to-use RestrictRealtime= option to unitsLennart Poettering
It takes a boolean value. If true, access to SCHED_RR, SCHED_FIFO and SCHED_DEADLINE is blocked, which my be used to lock up the system.
2016-06-10core/execute: add the magic character '!' to allow privileged execution (#3493)Alessandro Puccetti
This patch implements the new magic character '!'. By putting '!' in front of a command, systemd executes it with full privileges ignoring paramters such as User, Group, SupplementaryGroups, CapabilityBoundingSet, AmbientCapabilities, SecureBits, SystemCallFilter, SELinuxContext, AppArmorProfile, SmackProcessLabel, and RestrictAddressFamilies. Fixes partially https://github.com/systemd/systemd/issues/3414 Related to https://github.com/coreos/rkt/issues/2482 Testing: 1. Create a user 'bob' 2. Create the unit file /etc/systemd/system/exec-perm.service (You can use the example below) 3. sudo systemctl start ext-perm.service 4. Verify that the commands starting with '!' were not executed as bob, 4.1 Looking to the output of ls -l /tmp/exec-perm 4.2 Each file contains the result of the id command. ````````````````````````````````````````````````````````````````` [Unit] Description=ext-perm [Service] Type=oneshot TimeoutStartSec=0 User=bob ExecStartPre=!/usr/bin/sh -c "/usr/bin/rm /tmp/exec-perm*" ; /usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-start-pre" ExecStart=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-start" ; !/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-star-2" ExecStartPost=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-start-post" ExecReload=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-reload" ExecStop=!/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-stop" ExecStopPost=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-stop-post" [Install] WantedBy=multi-user.target] `````````````````````````````````````````````````````````````````
2016-06-03core: Restrict mmap and mprotect with PAGE_WRITE|PAGE_EXEC (#3319) (#3379)Topi Miettinen
New exec boolean MemoryDenyWriteExecute, when set, installs a seccomp filter to reject mmap(2) with PAGE_WRITE|PAGE_EXEC and mprotect(2) with PAGE_EXEC.
2016-02-13core: drop Capabilities= settingLennart Poettering
The setting is hardly useful (since its effect is generally reduced to zero due to file system caps), and with the advent of ambient caps an actually useful replacement exists, hence let's get rid of this. I am pretty sure this was unused and our man page already recommended against its use, hence this should be a safe thing to remove.
2016-02-11Remove kdbus custom endpoint supportDaniel Mack
This feature will not be used anytime soon, so remove a bit of cruft. The BusPolicy= config directive will stay around as compat noop.
2016-02-10tree-wide: remove Emacs lines from all filesDaniel Mack
This should be handled fine now by .dir-locals.el, so need to carry that stuff in every file.
2016-01-28core: don't reset /dev/console if stdin/stdout/stderr as passed as fd in a ↵Lennart Poettering
transient service Otherwise we might end resetting /dev/console all the time when a transient service starts or stops. Fixes #2377 Fixes #2198 Fixes #2061
2016-01-12capabilities: added support for ambient capabilities.Ismo Puustinen
This patch adds support for ambient capabilities in service files. The idea with ambient capabilities is that the execed processes can run with non-root user and get some inherited capabilities, without having any need to add the capabilities to the executable file. You need at least Linux 4.3 to use ambient capabilities. SecureBit keep-caps is automatically added when you use ambient capabilities and wish to change the user. An example system service file might look like this: [Unit] Description=Service for testing caps [Service] ExecStart=/usr/bin/sleep 10000 User=nobody AmbientCapabilities=CAP_NET_ADMIN CAP_NET_RAW After starting the service it has these capabilities: CapInh: 0000000000003000 CapPrm: 0000000000003000 CapEff: 0000000000003000 CapBnd: 0000003fffffffff CapAmb: 0000000000003000
2016-01-12capabilities: keep bounding set in non-inverted format.Ismo Puustinen
Change the capability bounding set parser and logic so that the bounding set is kept as a positive set internally. This means that the set reflects those capabilities that we want to keep instead of drop.
2015-11-18tree-wide: sort includes in *.hThomas Hindoe Paaboel Andersen
This is a continuation of the previous include sort patch, which only sorted for .c files.
2015-11-11execute: Add new PassEnvironment= directiveFilipe Brandenburger
This directive allows passing environment variables from the system manager to spawned services. Variables in the system manager can be set inside a container by passing `--set-env=...` options to systemd-spawn. Tested with an on-disk test.service unit. Tested using multiple variable names on a single line, with an empty setting to clear the current list of variables, with non-existing variables. Tested using `systemd-run -p PassEnvironment=VARNAME` to confirm it works with transient units. Confirmed that `systemctl show` will display the PassEnvironment settings. Checked that man pages are generated correctly. No regressions in `make check`.
2015-10-08core: add support for setting stdin/stdout/stderr for transient servicesLennart Poettering
When starting a transient service, allow setting stdin/stdout/stderr fds for it, by passing them in via the bus. This also simplifies some of the serialization code for units.
2015-10-06core: add support for naming file descriptors passed using socket activationLennart Poettering
This adds support for naming file descriptors passed using socket activation. The names are passed in a new $LISTEN_FDNAMES= environment variable, that matches the existign $LISTEN_FDS= one and contains a colon-separated list of names. This also adds support for naming fds submitted to the per-service fd store using FDNAME= in the sd_notify() message. This also adds a new FileDescriptorName= setting for socket unit files to set the name for fds created by socket units. This also adds a new call sd_listen_fds_with_names(), that is similar to sd_listen_fds(), but also returns the names of the fds. systemd-activate gained the new --fdname= switch to specify a name for testing socket activation. This is based on #1247 by Maciej Wereski. Fixes #1247.
2015-09-29core: allow setting WorkingDirectory= to the special value ~Lennart Poettering
If set to ~ the working directory is set to the home directory of the user configured in User=. This change also exposes the existing switch for the working directory that allowed making missing working directories non-fatal. This also changes "machinectl shell" to make use of this to ensure that the invoked shell is by default in the user's home directory. Fixes #1268.
2015-09-01core: unified cgroup hierarchy supportLennart Poettering
This patch set adds full support the new unified cgroup hierarchy logic of modern kernels. A new kernel command line option "systemd.unified_cgroup_hierarchy=1" is added. If specified the unified hierarchy is mounted to /sys/fs/cgroup instead of a tmpfs. No further hierarchies are mounted. The kernel command line option defaults to off. We can turn it on by default as soon as the kernel's APIs regarding this are stabilized (but even then downstream distros might want to turn this off, as this will break any tools that access cgroupfs directly). It is possibly to choose for each boot individually whether the unified or the legacy hierarchy is used. nspawn will by default provide the legacy hierarchy to containers if the host is using it, and the unified otherwise. However it is possible to run containers with the unified hierarchy on a legacy host and vice versa, by setting the $UNIFIED_CGROUP_HIERARCHY environment variable for nspawn to 1 or 0, respectively. The unified hierarchy provides reliable cgroup empty notifications for the first time, via inotify. To make use of this we maintain one manager-wide inotify fd, and each cgroup to it. This patch also removes cg_delete() which is unused now. On kernel 4.2 only the "memory" controller is compatible with the unified hierarchy, hence that's the only controller systemd exposes when booted in unified heirarchy mode. This introduces a new enum for enumerating supported controllers, plus a related enum for the mask bits mapping to it. The core is changed to make use of this everywhere. This moves PID 1 into a new "init.scope" implicit scope unit in the root slice. This is necessary since on the unified hierarchy cgroups may either contain subgroups or processes but not both. PID 1 hence has to move out of the root cgroup (strictly speaking the root cgroup is the only one where processes and subgroups are still allowed, but in order to support containers nicey, we move PID 1 into the new scope in all cases.) This new unit is also used on legacy hierarchy setups. It's actually pretty useful on all systems, as it can then be used to filter journal messages coming from PID 1, and so on. The root slice ("-.slice") is now implicitly created and started (and does not require a unit file on disk anymore), since that's where "init.scope" is located and the slice needs to be started before the scope can. To check whether we are in unified or legacy hierarchy mode we use statfs() on /sys/fs/cgroup. If the .f_type field reports tmpfs we are in legacy mode, if it reports cgroupfs we are in unified mode. This patch set carefuly makes sure that cgls and cgtop continue to work as desired. When invoking nspawn as a service it will implicitly create two subcgroups in the cgroup it is using, one to move the nspawn process into, the other to move the actual container processes into. This is done because of the requirement that cgroups may either contain processes or other subgroups.
2015-08-25execute: make the invalid entry of the enum -1Thomas Hindoe Paaboel Andersen
Set _EXEC_UTMP_MODE_INVALID to -1. This matches the return value from string_table_lookup.
2015-08-24core: optionally create LOGIN_PROCESS or USER_PROCESS utmp entriesLennart Poettering
When generating utmp/wtmp entries, optionally add both LOGIN_PROCESS and INIT_PROCESS entries or even all three of LOGIN_PROCESS, INIT_PROCESS and USER_PROCESS entries, instead of just a single INIT_PROCESS entry. With this change systemd may be used to not only invoke a getty directly in a SysV-compliant way but alternatively also a login(1) implementation or even forego getty and login entirely, and invoke arbitrary shells in a way that they appear in who(1) or w(1). This is preparation for a later commit that adds a "machinectl shell" operation to invoke a shell in a container, in a way that is compatible with who(1) and w(1).
2015-05-13Default to /usr/bin/u?mount, configurable, rather than hard-coded /bin/u?mount.Dimitri John Ledkov
2015-05-11core,network: major per-object logging reworkLennart Poettering
This changes log_unit_info() (and friends) to take a real Unit* object insted of just a unit name as parameter. The call will now prefix all logged messages with the unit name, thus allowing the unit name to be dropped from the various passed romat strings, simplifying invocations drastically, and unifying log output across messages. Also, UNIT= vs. USER_UNIT= is now derived from the Manager object attached to the Unit object, instead of getpid(). This has the benefit of correcting the field for --test runs. Also contains a couple of other logging improvements: - Drops a couple of strerror() invocations in favour of using %m. - Not only .mount units now warn if a symlinks exist for the mount point already, .automount units do that too, now. - A few invocations of log_struct() that didn't actually pass any additional structured data have been replaced by simpler invocations of log_unit_info() and friends. - For structured data a new LOG_UNIT_MESSAGE() macro has been added, that works like LOG_MESSAGE() but prefixes the message with the unit name. Similar, there's now LOG_LINK_MESSAGE() and LOG_NETDEV_MESSAGE(). - For structured data new LOG_UNIT_ID(), LOG_LINK_INTERFACE(), LOG_NETDEV_INTERFACE() macros have been added that generate the necessary per object fields. The old log_unit_struct() call has been removed in favour of these new macros used in raw log_struct() invocations. In addition to removing one more function call this allows generated structured log messages that contain two object fields, as necessary for example for network interfaces that are joined into another network interface, and whose messages shall be indexed by both. - The LOG_ERRNO() macro has been removed, in favour of log_struct_errno(). The latter has the benefit of ensuring that %m in format strings is properly resolved to the specified error number. - A number of logging messages have been converted to use log_unit_info() instead of log_info() - The client code in sysv-generator no longer #includes core code from src/core/. - log_unit_full_errno() has been removed, log_unit_full() instead takes an errno now, too. - log_unit_info(), log_link_info(), log_netdev_info() and friends, now avoid double evaluation of their parameters
2015-02-23remove unused includesThomas Hindoe Paaboel Andersen
This patch removes includes that are not used. The removals were found with include-what-you-use which checks if any of the symbols from a header is in use.
2015-02-12Add missing includes in header filesThomas Hindoe Paaboel Andersen
This fixes various issues found by globally reordering the include sections of all .c files.
2015-02-12core: don't fail to run services in --user instances if $HOME is missingLennart Poettering
Otherwise we cannot even invoke systemd-exit.service anymore, thus not even exit. https://bugs.freedesktop.org/show_bug.cgi?id=83100 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=759320
2014-12-18core: make exec_command_free_list return NULLZbigniew Jędrzejewski-Szmek
2014-11-24smack: introduce new SmackProcessLabel optionWaLyong Cho
In service file, if the file has some of special SMACK label in ExecStart= and systemd has no permission for the special SMACK label then permission error will occurred. To resolve this, systemd should be able to set its SMACK label to something accessible of ExecStart=. So introduce new SmackProcessLabel. If label is specified with SmackProcessLabel= then the child systemd will set its label to that. To successfully execute the ExecStart=, accessible label should be specified with SmackProcessLabel=. Additionally, by SMACK policy, if the file in ExecStart= has no SMACK64EXEC then the executed process will have given label by SmackProcessLabel=. But if the file has SMACK64EXEC then the SMACK64EXEC label will be overridden. [zj: reword man page]
2014-11-05core: introduce new Delegate=yes/no property controlling creation of cgroup ↵Lennart Poettering
subhierarchies For priviliged units this resource control property ensures that the processes have all controllers systemd manages enabled. For unpriviliged services (those with User= set) this ensures that access rights to the service cgroup is granted to the user in question, to create further subgroups. Note that this only applies to the name=systemd hierarchy though, as access to other controllers is not safe for unpriviliged processes. Delegate=yes should be set for container scopes where a systemd instance inside the container shall manage the hierarchies below its own cgroup and have access to all controllers. Delegate=yes should also be set for user@.service, so that systemd --user can run, controlling its own cgroup tree. This commit changes machined, systemd-nspawn@.service and user@.service to set this boolean, in order to ensure that container management will just work, and the user systemd instance can run fine.
2014-10-17environment: append unit_id to error messages regarding EnvironmentFileLukas Nykryn
2014-09-29swap: introduce Discard propertyJan Synacek
Process possible "discard" values from /etc/fstab.