summaryrefslogtreecommitdiff
path: root/man/systemd.exec.xml
AgeCommit message (Collapse)Author
2016-12-06man: fix $SERVICE_RESULT/$EXIT_CODE/$EXIT_STATUS documentationJouke Witteveen
Note that any exit code is available through $EXIT_STATUS and not through $EXIT_CODE. This mimics siginfo.
2016-11-29bus-util: add protocol error type explanationJouke Witteveen
2016-11-23man: document protocol error type for service failures (#4724)Jouke Witteveen
2016-11-21seccomp: add @filesystem syscall group (#4537)Lennart Poettering
@filesystem groups various file system operations, such as opening files and directories for read/write and stat()ing them, plus renaming, deleting, symlinking, hardlinking.
2016-11-17namespace: simplify, optimize and extend handling of mounts for namespaceLennart Poettering
This changes a couple of things in the namespace handling: It merges the BindMount and TargetMount structures. They are mostly the same, hence let's just use the same structue, and rely on C's implicit zero initialization of partially initialized structures for the unneeded fields. This reworks memory management of each entry a bit. It now contains one "const" and one "malloc" path. We use the former whenever we can, but use the latter when we have to, which is the case when we have to chase symlinks or prefix a root directory. This means in the common case we don't actually need to allocate any dynamic memory. To make this easy to use we add an accessor function bind_mount_path() which retrieves the right path string from a BindMount structure. While we are at it, also permit "+" as prefix for dirs configured with ReadOnlyPaths= and friends: if specified the root directory of the unit is implicited prefixed. This also drops set_bind_mount() and uses C99 structure initialization instead, which I think is more readable and clarifies what is being done. This drops append_protect_kernel_tunables() and append_protect_kernel_modules() as append_static_mounts() is now simple enough to be called directly. Prefixing with the root dir is now done in an explicit step in prefix_where_needed(). It will prepend the root directory on each entry that doesn't have it prefixed yet. The latter is determined depending on an extra bit in the BindMount structure.
2016-11-15doc: move ProtectKernelModules= documentation near ProtectKernelTunalbes=Djalal Harouni
2016-11-15doc: note when no new privileges is impliedDjalal Harouni
2016-11-04core: add new RestrictNamespaces= unit file settingLennart Poettering
This new setting permits restricting whether namespaces may be created and managed by processes started by a unit. It installs a seccomp filter blocking certain invocations of unshare(), clone() and setns(). RestrictNamespaces=no is the default, and does not restrict namespaces in any way. RestrictNamespaces=yes takes away the ability to create or manage any kind of namspace. "RestrictNamespaces=mnt ipc" restricts the creation of namespaces so that only mount and IPC namespaces may be created/managed, but no other kind of namespaces. This setting should be improve security quite a bit as in particular user namespacing was a major source of CVEs in the kernel in the past, and is accessible to unprivileged processes. With this setting the entire attack surface may be removed for system services that do not make use of namespaces.
2016-11-03Merge pull request #4548 from keszybz/seccomp-helpZbigniew Jędrzejewski-Szmek
systemd-analyze syscall-filter
2016-11-03doc: clarify NoNewPrivileges (#4562)Kees Cook
Setting no_new_privs does not stop UID changes, but rather blocks gaining privileges through execve(). Also fixes a small typo.
2016-11-03seccomp-util, analyze: export comments as a help stringZbigniew Jędrzejewski-Szmek
Just to make the whole thing easier for users.
2016-11-03analyze: add syscall-filter verbZbigniew Jędrzejewski-Szmek
This should make it easier for users to understand what each filter means as the list of syscalls is updated in subsequent systemd versions.
2016-11-02man: document that too strict system call filters may affect the service managerLennart Poettering
If execve() or socket() is filtered the service manager might get into trouble executing the service binary, or handling any failures when this fails. Mention this in the documentation. The other option would be to implicitly whitelist all system calls that are required for these codepaths. However, that appears less than desirable as this would mean socket() and many related calls have to be whitelisted unconditionally. As writing system call filters requires a certain level of expertise anyway it sounds like the better option to simply document these issues and suggest that the user disables system call filters in the service temporarily in order to debug any such failures. See: #3993.
2016-11-02seccomp: add two new syscall groupsLennart Poettering
@resources contains various syscalls that alter resource limits and memory and scheduling parameters of processes. As such they are good candidates to block for most services. @basic-io contains a number of basic syscalls for I/O, similar to the list seccomp v1 permitted but slightly more complete. It should be useful for building basic whitelisting for minimal sandboxes
2016-11-02man: two minor fixesLennart Poettering
2016-11-02seccomp: include pipes and memfd in @ipcLennart Poettering
These system calls clearly fall in the @ipc category, hence should be listed there, simply to avoid confusion and surprise by the user.
2016-11-02seccomp: drop execve() from @process listLennart Poettering
The system call is already part in @default hence implicitly allowed anyway. Also, if it is actually blocked then systemd couldn't execute the service in question anymore, since the application of seccomp is immediately followed by it.
2016-11-02seccomp: add clock query and sleeping syscalls to "@default" groupLennart Poettering
Timing and sleep are so basic operations, it makes very little sense to ever block them, hence don't.
2016-11-01seccomp: allow specifying arm64, mips, ppc (#4491)Zbigniew Jędrzejewski-Szmek
"Secondary arch" table for mips is entirely speculative…
2016-10-31man: fix typos (#4527)Jakub Wilk
2016-10-28Merge pull request #4495 from topimiettinen/block-shmat-execDjalal Harouni
seccomp: also block shmat(..., SHM_EXEC) for MemoryDenyWriteExecute
2016-10-26seccomp: also block shmat(..., SHM_EXEC) for MemoryDenyWriteExecuteTopi Miettinen
shmat(..., SHM_EXEC) can be used to create writable and executable memory, so let's block it when MemoryDenyWriteExecute is set.
2016-10-24man: document the default value of NoNewPrivileges=Zbigniew Jędrzejewski-Szmek
Fixes #4329.
2016-10-20man: document default for User=Lennart Poettering
Replaces: #4375
2016-10-17core/exec: add a named-descriptor option ("fd") for streams (#4179)Luca Bruno
This commit adds a `fd` option to `StandardInput=`, `StandardOutput=` and `StandardError=` properties in order to connect standard streams to externally named descriptors provided by some socket units. This option looks for a file descriptor named as the corresponding stream. Custom names can be specified, separated by a colon. If multiple name-matches exist, the first matching fd will be used.
2016-10-17man: avoid abbreviated "cgroups" terminology (#4396)Lennart Poettering
Let's avoid the overly abbreviated "cgroups" terminology. Let's instead write: "Linux Control Groups (cgroups)" is the long form wherever the term is introduced in prose. Use "control groups" in the short form wherever the term is used within brief explanations. Follow-up to: #4381
2016-10-15man: add crosslink between systemd.resource-control(5) and systemd.exec(5)Zbigniew Jędrzejewski-Szmek
Fixes #4379.
2016-10-13Merge pull request #4243 from ↵Lennart Poettering
endocode/djalal/sandbox-first-protection-kernelmodules-v1 core:sandbox: Add ProtectKernelModules= and some fixes
2016-10-12man: typo fixesThomas Hindoe Paaboel Andersen
A mix of fixes for typos and UK english
2016-10-12core:sandbox: lets make /lib/modules/ inaccessible on ProtectKernelModules=Djalal Harouni
Lets go further and make /lib/modules/ inaccessible for services that do not have business with modules, this is a minor improvment but it may help on setups with custom modules and they are limited... in regard of kernel auto-load feature. This change introduce NameSpaceInfo struct which we may embed later inside ExecContext but for now lets just reduce the argument number to setup_namespace() and merge ProtectKernelModules feature.
2016-10-12doc: minor hint about InaccessiblePaths= in regard of ProtectKernelTunables=Djalal Harouni
2016-10-12core:sandbox: remove CAP_SYS_RAWIO on PrivateDevices=yesDjalal Harouni
The rawio system calls were filtered, but CAP_SYS_RAWIO allows to access raw data through /proc, ioctl and some other exotic system calls...
2016-10-12core:sandbox: Add ProtectKernelModules= optionDjalal Harouni
This is useful to turn off explicit module load and unload operations on modular kernels. This option removes CAP_SYS_MODULE from the capability bounding set for the unit, and installs a system call filter to block module system calls. This option will not prevent the kernel from loading modules using the module auto-load feature which is a system wide operation.
2016-10-11Merge pull request #4348 from poettering/docfixesZbigniew Jędrzejewski-Szmek
Various smaller documentation fixes.
2016-10-11man: beef up documentation on per-unit resource limits a bitLennart Poettering
Let's clarify that for user services some OS-defined limits bound the settings in the unit files. Fixes: #4232
2016-10-07core: add "invocation ID" concept to service managerLennart Poettering
This adds a new invocation ID concept to the service manager. The invocation ID identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is generated each time a unit moves from and inactive to an activating or active state. The primary usecase for this concept is to connect the runtime data PID 1 maintains about a service with the offline data the journal stores about it. Previously we'd use the unit name plus start/stop times, which however is highly racy since the journal will generally process log data after the service already ended. The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel, except that it applies to an individual unit instead of the whole system. The invocation ID is passed to the activated processes as environment variable. It is additionally stored as extended attribute on the cgroup of the unit. The latter is used by journald to automatically retrieve it for each log logged message and attach it to the log entry. The environment variable is very easily accessible, even for unprivileged services. OTOH the extended attribute is only accessible to privileged processes (this is because cgroupfs only supports the "trusted." xattr namespace, not "user."). The environment variable may be altered by services, the extended attribute may not be, hence is the better choice for the journal. Note that reading the invocation ID off the extended attribute from journald is racy, similar to the way reading the unit name for a logging process is. This patch adds APIs to read the invocation ID to sd-id128: sd_id128_get_invocation() may be used in a similar fashion to sd_id128_get_boot(). PID1's own logging is updated to always include the invocation ID when it logs information about a unit. A new bus call GetUnitByInvocationID() is added that allows retrieving a bus path to a unit by its invocation ID. The bus path is built using the invocation ID, thus providing a path for referring to a unit that is valid only for the current runtime cycleof it. Outlook for the future: should the kernel eventually allow passing of cgroup information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we can alter the invocation ID to be generated as hash from that rather than entirely randomly. This way we can derive the invocation race-freely from the messages.
2016-10-05seccomp: add support for the s390 architecture (#4287)hbrueckner
Add seccomp support for the s390 architecture (31-bit and 64-bit) to systemd. This requires libseccomp >= 2.3.1.
2016-10-03man: remove consecutive duplicate words (#4268)Stefan Schweter
This PR removes consecutive duplicate words from the man pages of: * `resolved.conf.xml` * `systemd.exec.xml` * `systemd.socket.xml`
2016-09-25core: Use @raw-io syscall group to filter I/O syscalls when PrivateDevices= ↵Djalal Harouni
is set Instead of having a local syscall list, use the @raw-io group which contains the same set of syscalls to filter.
2016-09-25core:sandbox: add more /proc/* entries to ProtectKernelTunables=Djalal Harouni
Make ALSA entries, latency interface, mtrr, apm/acpi, suspend interface, filesystems configuration and IRQ tuning readonly. Most of these interfaces now days should be in /sys but they are still available through /proc, so just protect them. This patch does not touch /proc/net/...
2016-09-25doc: explicitly document that /dev/mem and /dev/port are blocked by ↵Djalal Harouni
PrivateDevices=true
2016-09-25doc: documentation fixes for ReadWritePaths= and ProtectKernelTunables=Djalal Harouni
Documentation fixes for ReadWritePaths= and ProtectKernelTunables= as reported by Evgeny Vereshchagin.
2016-09-25man: shorten the exit status table a bitLennart Poettering
Let's merge a couple of columns, to make the table a bit shorter. This effectively just drops whitespace, not contents, but makes the currently humungous table much much more compact.
2016-09-25man: the exit code/signal is stored in $EXIT_CODE, not $EXIT_STATUSLennart Poettering
2016-09-25man: rework documentation for ReadOnlyPaths= and related settingsLennart Poettering
This reworks the documentation for ReadOnlyPaths=, ReadWritePaths=, InaccessiblePaths=. It no longer claims that we'd follow symlinks relative to the host file system. (Which wasn't true actually, as we didn't follow symlinks at all in the most recent releases, and we know do follow them, but relative to RootDirectory=). This also replaces all references to the fact that all fs namespacing options can be undone with enough privileges and disable propagation by a single one in the documentation of ReadOnlyPaths= and friends, and then directs the read to this in all other places. Moreover a hint is added to the documentation of SystemCallFilter=, suggesting usage of ~@mount in case any of the fs namespacing related options are used.
2016-09-25man: in user-facing documentaiton don't reference C function namesLennart Poettering
Let's drop the reference to the cap_from_name() function in the documentation for the capabilities setting, as it is hardly helpful. Our readers are not necessarily C hackers knowing the semantics of cap_from_name(). Moreover, the strings we accept are just the plain capability names as listed in capabilities(7) hence there's really no point in confusing the user with anything else.
2016-09-25core: imply ProtectHome=read-only and ProtectSystem=strict if DynamicUser=1Lennart Poettering
Let's make sure that services that use DynamicUser=1 cannot leave files in the file system should the system accidentally have a world-writable directory somewhere. This effectively ensures that directories need to be whitelisted rather than blacklisted for access when DynamicUser=1 is set.
2016-09-25core: introduce ProtectSystem=strictLennart Poettering
Let's tighten our sandbox a bit more: with this change ProtectSystem= gains a new setting "strict". If set, the entire directory tree of the system is mounted read-only, but the API file systems /proc, /dev, /sys are excluded (they may be managed with PrivateDevices= and ProtectKernelTunables=). Also, /home and /root are excluded as those are left for ProtectHome= to manage. In this mode, all "real" file systems (i.e. non-API file systems) are mounted read-only, and specific directories may only be excluded via ReadWriteDirectories=, thus implementing an effective whitelist instead of blacklist of writable directories. While we are at, also add /efi to the list of paths always affected by ProtectSystem=. This is a follow-up for b52a109ad38cd37b660ccd5394ff5c171a5e5355 which added /efi as alternative for /boot. Our namespacing logic should respect that too.
2016-09-25core: add two new service settings ProtectKernelTunables= and ↵Lennart Poettering
ProtectControlGroups= If enabled, these will block write access to /sys, /proc/sys and /proc/sys/fs/cgroup.
2016-08-19core: add RemoveIPC= settingLennart Poettering
This adds the boolean RemoveIPC= setting to service, socket, mount and swap units (i.e. all unit types that may invoke processes). if turned on, and the unit's user/group is not root, all IPC objects of the user/group are removed when the service is shut down. The life-cycle of the IPC objects is hence bound to the unit life-cycle. This is particularly relevant for units with dynamic users, as it is essential that no objects owned by the dynamic users survive the service exiting. In fact, this patch adds code to imply RemoveIPC= if DynamicUser= is set. In order to communicate the UID/GID of an executed process back to PID 1 this adds a new "user lookup" socket pair, that is inherited into the forked processes, and closed before the exec(). This is needed since we cannot do NSS from PID 1 due to deadlock risks, However need to know the used UID/GID in order to clean up IPC owned by it if the unit shuts down.