diff options
Diffstat (limited to 'man/systemd.exec.xml')
-rw-r--r-- | man/systemd.exec.xml | 1022 |
1 files changed, 760 insertions, 262 deletions
diff --git a/man/systemd.exec.xml b/man/systemd.exec.xml index 45a4422dc3..c088042a51 100644 --- a/man/systemd.exec.xml +++ b/man/systemd.exec.xml @@ -1,3 +1,4 @@ +<?xml version='1.0'?> <!--*- Mode: nxml; nxml-child-indent: 2; indent-tabs-mode: nil -*--> <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"> @@ -76,6 +77,29 @@ </refsect1> <refsect1> + <title>Automatic Dependencies</title> + + <para>A few execution parameters result in additional, automatic + dependencies to be added.</para> + + <para>Units with <varname>WorkingDirectory=</varname> or + <varname>RootDirectory=</varname> set automatically gain + dependencies of type <varname>Requires=</varname> and + <varname>After=</varname> on all mount units required to access + the specified paths. This is equivalent to having them listed + explicitly in <varname>RequiresMountsFor=</varname>.</para> + + <para>Similar, units with <varname>PrivateTmp=</varname> enabled + automatically get mount unit dependencies for all mounts + required to access <filename>/tmp</filename> and + <filename>/var/tmp</filename>.</para> + + <para>Units whose standard output or error output is connected to <option>journal</option>, <option>syslog</option> + or <option>kmsg</option> (or their combinations with console output, see below) automatically acquire dependencies + of type <varname>After=</varname> on <filename>systemd-journald.socket</filename>.</para> + </refsect1> + + <refsect1> <title>Options</title> <variablelist class='unit-directives'> @@ -83,32 +107,71 @@ <varlistentry> <term><varname>WorkingDirectory=</varname></term> - <listitem><para>Takes an absolute directory path. Sets the - working directory for executed processes. If not set, defaults - to the root directory when systemd is running as a system - instance and the respective user's home directory if run as - user.</para></listitem> + <listitem><para>Takes a directory path relative to the service's root directory specified by + <varname>RootDirectory=</varname>, or the special value <literal>~</literal>. Sets the working directory for + executed processes. If set to <literal>~</literal>, the home directory of the user specified in + <varname>User=</varname> is used. If not set, defaults to the root directory when systemd is running as a + system instance and the respective user's home directory if run as user. If the setting is prefixed with the + <literal>-</literal> character, a missing working directory is not considered fatal. If + <varname>RootDirectory=</varname> is not set, then <varname>WorkingDirectory=</varname> is relative to the root + of the system running the service manager. Note that setting this parameter might result in additional + dependencies to be added to the unit (see above).</para></listitem> </varlistentry> <varlistentry> <term><varname>RootDirectory=</varname></term> - <listitem><para>Takes an absolute directory path. Sets the - root directory for executed processes, with the - <citerefentry project='man-pages'><refentrytitle>chroot</refentrytitle><manvolnum>2</manvolnum></citerefentry> - system call. If this is used, it must be ensured that the - process and all its auxiliary files are available in the - <function>chroot()</function> jail.</para></listitem> + <listitem><para>Takes a directory path relative to the host's root directory (i.e. the root of the system + running the service manager). Sets the root directory for executed processes, with the <citerefentry + project='man-pages'><refentrytitle>chroot</refentrytitle><manvolnum>2</manvolnum></citerefentry> system + call. If this is used, it must be ensured that the process binary and all its auxiliary files are available in + the <function>chroot()</function> jail. Note that setting this parameter might result in additional + dependencies to be added to the unit (see above).</para> + + <para>The <varname>PrivateUsers=</varname> setting is particularly useful in conjunction with + <varname>RootDirectory=</varname>. For details, see below.</para></listitem> </varlistentry> <varlistentry> <term><varname>User=</varname></term> <term><varname>Group=</varname></term> - <listitem><para>Sets the Unix user or group that the processes - are executed as, respectively. Takes a single user or group - name or ID as argument. If no group is set, the default group - of the user is chosen.</para></listitem> + <listitem><para>Set the UNIX user or group that the processes are executed as, respectively. Takes a single + user or group name, or numeric ID as argument. If no group is set, the default group of the user is used. This + setting does not affect commands whose command line is prefixed with <literal>+</literal>.</para></listitem> + </varlistentry> + + <varlistentry> + <term><varname>DynamicUser=</varname></term> + + <listitem><para>Takes a boolean parameter. If set, a UNIX user and group pair is allocated dynamically when the + unit is started, and released as soon as it is stopped. The user and group will not be added to + <filename>/etc/passwd</filename> or <filename>/etc/group</filename>, but are managed transiently during + runtime. The <citerefentry><refentrytitle>nss-systemd</refentrytitle><manvolnum>8</manvolnum></citerefentry> + glibc NSS module provides integration of these dynamic users/groups into the system's user and group + databases. The user and group name to use may be configured via <varname>User=</varname> and + <varname>Group=</varname> (see above). If these options are not used and dynamic user/group allocation is + enabled for a unit, the name of the dynamic user/group is implicitly derived from the unit name. If the unit + name without the type suffix qualifies as valid user name it is used directly, otherwise a name incorporating a + hash of it is used. If a statically allocated user or group of the configured name already exists, it is used + and no dynamic user/group is allocated. Dynamic users/groups are allocated from the UID/GID range + 61184…65519. It is recommended to avoid this range for regular system or login users. At any point in time + each UID/GID from this range is only assigned to zero or one dynamically allocated users/groups in + use. However, UID/GIDs are recycled after a unit is terminated. Care should be taken that any processes running + as part of a unit for which dynamic users/groups are enabled do not leave files or directories owned by these + users/groups around, as a different unit might get the same UID/GID assigned later on, and thus gain access to + these files or directories. If <varname>DynamicUser=</varname> is enabled, <varname>RemoveIPC=</varname>, + <varname>PrivateTmp=</varname> are implied. This ensures that the lifetime of IPC objects and temporary files + created by the executed processes is bound to the runtime of the service, and hence the lifetime of the dynamic + user/group. Since <filename>/tmp</filename> and <filename>/var/tmp</filename> are usually the only + world-writable directories on a system this ensures that a unit making use of dynamic user/group allocation + cannot leave files around after unit termination. Moreover <varname>ProtectSystem=strict</varname> and + <varname>ProtectHome=read-only</varname> are implied, thus prohibiting the service to write to arbitrary file + system locations. In order to allow the service to write to certain directories, they have to be whitelisted + using <varname>ReadWritePaths=</varname>, but care must be taken so that UID/GID recycling doesn't + create security issues involving files created by the service. Use <varname>RuntimeDirectory=</varname> (see + below) in order to assign a writable runtime directory to a service, owned by the dynamic user/group and + removed automatically when the unit is terminated. Defaults to off.</para></listitem> </varlistentry> <varlistentry> @@ -117,13 +180,25 @@ <listitem><para>Sets the supplementary Unix groups the processes are executed as. This takes a space-separated list of group names or IDs. This option may be specified more than - once in which case all listed groups are set as supplementary - groups. When the empty string is assigned the list of + once, in which case all listed groups are set as supplementary + groups. When the empty string is assigned, the list of supplementary groups is reset, and all assignments prior to this one will have no effect. In any way, this option does not override, but extends the list of supplementary groups configured in the system group database for the - user.</para></listitem> + user. This does not affect commands prefixed with <literal>+</literal>.</para></listitem> + </varlistentry> + + <varlistentry> + <term><varname>RemoveIPC=</varname></term> + + <listitem><para>Takes a boolean parameter. If set, all System V and POSIX IPC objects owned by the user and + group the processes of this unit are run as are removed when the unit is stopped. This setting only has an + effect if at least one of <varname>User=</varname>, <varname>Group=</varname> and + <varname>DynamicUser=</varname> are used. It has no effect on IPC objects owned by the root user. Specifically, + this removes System V semaphores, as well as System V and POSIX shared memory segments and message queues. If + multiple units use the same user or group the IPC objects are removed when the last of these units is + stopped. This setting is implied if <varname>DynamicUser=</varname> is set.</para></listitem> </varlistentry> <varlistentry> @@ -151,7 +226,7 @@ <varlistentry> <term><varname>IOSchedulingClass=</varname></term> - <listitem><para>Sets the IO scheduling class for executed + <listitem><para>Sets the I/O scheduling class for executed processes. Takes an integer between 0 and 3 or one of the strings <option>none</option>, <option>realtime</option>, <option>best-effort</option> or <option>idle</option>. See @@ -162,10 +237,10 @@ <varlistentry> <term><varname>IOSchedulingPriority=</varname></term> - <listitem><para>Sets the IO scheduling priority for executed + <listitem><para>Sets the I/O scheduling priority for executed processes. Takes an integer between 0 (highest priority) and 7 (lowest priority). The available priorities depend on the - selected IO scheduling class (see above). See + selected I/O scheduling class (see above). See <citerefentry><refentrytitle>ioprio_set</refentrytitle><manvolnum>2</manvolnum></citerefentry> for details.</para></listitem> </varlistentry> @@ -211,8 +286,10 @@ <term><varname>CPUAffinity=</varname></term> <listitem><para>Controls the CPU affinity of the executed - processes. Takes a space-separated list of CPU indices. This - option may be specified more than once in which case the + processes. Takes a list of CPU indices or ranges separated by + either whitespace or commas. CPU ranges are specified by the + lower and upper CPU indices separated by a dash. + This option may be specified more than once, in which case the specified CPU affinity masks are merged. If the empty string is assigned, the mask is reset, all assignments prior to this will have no effect. See @@ -234,7 +311,7 @@ <listitem><para>Sets environment variables for executed processes. Takes a space-separated list of variable - assignments. This option may be specified more than once in + assignments. This option may be specified more than once, in which case all listed variables will be set. If the same variable is set twice, the later setting will override the earlier setting. If the empty string is assigned to this @@ -263,7 +340,8 @@ <listitem><para>Similar to <varname>Environment=</varname> but reads the environment variables from a text file. The text file should contain new-line-separated variable assignments. - Empty lines and lines starting with ; or # will be ignored, + Empty lines, lines without an <literal>=</literal> separator, + or lines starting with ; or # will be ignored, which may be used for commenting. A line ending with a backslash will be concatenated with the following one, allowing multiline variable definitions. The parser strips @@ -294,6 +372,33 @@ </varlistentry> <varlistentry> + <term><varname>PassEnvironment=</varname></term> + + <listitem><para>Pass environment variables from the systemd system + manager to executed processes. Takes a space-separated list of variable + names. This option may be specified more than once, in which case all + listed variables will be set. If the empty string is assigned to this + option, the list of environment variables is reset, all prior + assignments have no effect. Variables that are not set in the system + manager will not be passed and will be silently ignored.</para> + + <para>Variables passed from this setting are overridden by those passed + from <varname>Environment=</varname> or + <varname>EnvironmentFile=</varname>.</para> + + <para>Example: + <programlisting>PassEnvironment=VAR1 VAR2 VAR3</programlisting> + passes three variables <literal>VAR1</literal>, + <literal>VAR2</literal>, <literal>VAR3</literal> + with the values set for those variables in PID1.</para> + + <para> + See + <citerefentry project='man-pages'><refentrytitle>environ</refentrytitle><manvolnum>7</manvolnum></citerefentry> + for details about environment variables.</para></listitem> + </varlistentry> + + <varlistentry> <term><varname>StandardInput=</varname></term> <listitem><para>Controls where file descriptor 0 (STDIN) of the executed processes is connected to. Takes one of @@ -342,6 +447,7 @@ <para>This setting defaults to <option>null</option>.</para></listitem> </varlistentry> + <varlistentry> <term><varname>StandardOutput=</varname></term> <listitem><para>Controls where file descriptor 1 (STDOUT) of @@ -404,11 +510,18 @@ similar to the same option of <varname>StandardInput=</varname>.</para> + <para>If the standard output (or error output, see below) of a unit is connected to the journal, syslog or the + kernel log buffer, the unit will implicitly gain a dependency of type <varname>After=</varname> on + <filename>systemd-journald.socket</filename> (also see the automatic dependencies section above).</para> + <para>This setting defaults to the value set with <option>DefaultStandardOutput=</option> in <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>, - which defaults to <option>journal</option>.</para></listitem> + which defaults to <option>journal</option>. Note that setting + this parameter might result in additional dependencies to be + added to the unit (see above).</para></listitem> </varlistentry> + <varlistentry> <term><varname>StandardError=</varname></term> <listitem><para>Controls where file descriptor 2 (STDERR) of @@ -419,8 +532,11 @@ standard error. This setting defaults to the value set with <option>DefaultStandardError=</option> in <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>, - which defaults to <option>inherit</option>.</para></listitem> + which defaults to <option>inherit</option>. Note that setting + this parameter might result in additional dependencies to be + added to the unit (see above).</para></listitem> </varlistentry> + <varlistentry> <term><varname>TTYPath=</varname></term> <listitem><para>Sets the terminal device node to use if @@ -484,7 +600,7 @@ </varlistentry> <varlistentry> <term><varname>SyslogLevel=</varname></term> - <listitem><para>Default syslog level to use when logging to + <listitem><para>The default syslog level to use when logging to syslog or the kernel log buffer. One of <option>emerg</option>, <option>alert</option>, @@ -503,7 +619,7 @@ different log level which can be used to override the default log level specified here. The interpretation of these prefixes may be disabled with <varname>SyslogLevelPrefix=</varname>, - see below. For details see + see below. For details, see <citerefentry><refentrytitle>sd-daemon</refentrytitle><manvolnum>3</manvolnum></citerefentry>. Defaults to @@ -555,92 +671,147 @@ <term><varname>LimitNICE=</varname></term> <term><varname>LimitRTPRIO=</varname></term> <term><varname>LimitRTTIME=</varname></term> - <listitem><para>These settings set both soft and hard limits - of various resources for executed processes. See - <citerefentry><refentrytitle>setrlimit</refentrytitle><manvolnum>2</manvolnum></citerefentry> - for details. Use the string <varname>infinity</varname> to - configure no limit on a specific resource.</para></listitem> + <listitem><para>Set soft and hard limits on various resources for executed processes. See + <citerefentry><refentrytitle>setrlimit</refentrytitle><manvolnum>2</manvolnum></citerefentry> for details on + the resource limit concept. Resource limits may be specified in two formats: either as single value to set a + specific soft and hard limit to the same value, or as colon-separated pair <option>soft:hard</option> to set + both limits individually (e.g. <literal>LimitAS=4G:16G</literal>). Use the string <varname>infinity</varname> + to configure no limit on a specific resource. The multiplicative suffixes K, M, G, T, P and E (to the base + 1024) may be used for resource limits measured in bytes (e.g. LimitAS=16G). For the limits referring to time + values, the usual time units ms, s, min, h and so on may be used (see + <citerefentry><refentrytitle>systemd.time</refentrytitle><manvolnum>7</manvolnum></citerefentry> for + details). Note that if no time unit is specified for <varname>LimitCPU=</varname> the default unit of seconds + is implied, while for <varname>LimitRTTIME=</varname> the default unit of microseconds is implied. Also, note + that the effective granularity of the limits might influence their enforcement. For example, time limits + specified for <varname>LimitCPU=</varname> will be rounded up implicitly to multiples of 1s. For + <varname>LimitNICE=</varname> the value may be specified in two syntaxes: if prefixed with <literal>+</literal> + or <literal>-</literal>, the value is understood as regular Linux nice value in the range -20..19. If not + prefixed like this the value is understood as raw resource limit parameter in the range 0..40 (with 0 being + equivalent to 1).</para> + + <para>Note that most process resource limits configured with + these options are per-process, and processes may fork in order + to acquire a new set of resources that are accounted + independently of the original process, and may thus escape + limits set. Also note that <varname>LimitRSS=</varname> is not + implemented on Linux, and setting it has no effect. Often it + is advisable to prefer the resource controls listed in + <citerefentry><refentrytitle>systemd.resource-control</refentrytitle><manvolnum>5</manvolnum></citerefentry> + over these per-process limits, as they apply to services as a + whole, may be altered dynamically at runtime, and are + generally more expressive. For example, + <varname>MemoryLimit=</varname> is a more powerful (and + working) replacement for <varname>LimitRSS=</varname>.</para> + + <para>For system units these resource limits may be chosen freely. For user units however (i.e. units run by a + per-user instance of + <citerefentry><refentrytitle>systemd</refentrytitle><manvolnum>1</manvolnum></citerefentry>), these limits are + bound by (possibly more restrictive) per-user limits enforced by the OS.</para> + + <para>Resource limits not configured explicitly for a unit default to the value configured in the various + <varname>DefaultLimitCPU=</varname>, <varname>DefaultLimitFSIZE=</varname>, … options available in + <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>, and – + if not configured there – the kernel or per-user defaults, as defined by the OS (the latter only for user + services, see above).</para> <table> - <title>Limit directives and their equivalent with ulimit</title> + <title>Resource limit directives, their equivalent <command>ulimit</command> shell commands and the unit used</title> - <tgroup cols='2'> + <tgroup cols='3'> <colspec colname='directive' /> <colspec colname='equivalent' /> + <colspec colname='unit' /> <thead> <row> <entry>Directive</entry> - <entry>ulimit equivalent</entry> + <entry><command>ulimit</command> equivalent</entry> + <entry>Unit</entry> </row> </thead> <tbody> <row> - <entry>LimitCPU</entry> + <entry>LimitCPU=</entry> <entry>ulimit -t</entry> + <entry>Seconds</entry> </row> <row> - <entry>LimitFSIZE</entry> + <entry>LimitFSIZE=</entry> <entry>ulimit -f</entry> + <entry>Bytes</entry> </row> <row> - <entry>LimitDATA</entry> + <entry>LimitDATA=</entry> <entry>ulimit -d</entry> + <entry>Bytes</entry> </row> <row> - <entry>LimitSTACK</entry> + <entry>LimitSTACK=</entry> <entry>ulimit -s</entry> + <entry>Bytes</entry> </row> <row> - <entry>LimitCORE</entry> + <entry>LimitCORE=</entry> <entry>ulimit -c</entry> + <entry>Bytes</entry> </row> <row> - <entry>LimitRSS</entry> + <entry>LimitRSS=</entry> <entry>ulimit -m</entry> + <entry>Bytes</entry> </row> <row> - <entry>LimitNOFILE</entry> + <entry>LimitNOFILE=</entry> <entry>ulimit -n</entry> + <entry>Number of File Descriptors</entry> </row> <row> - <entry>LimitAS</entry> + <entry>LimitAS=</entry> <entry>ulimit -v</entry> + <entry>Bytes</entry> </row> <row> - <entry>LimitNPROC</entry> + <entry>LimitNPROC=</entry> <entry>ulimit -u</entry> + <entry>Number of Processes</entry> </row> <row> - <entry>LimitMEMLOCK</entry> + <entry>LimitMEMLOCK=</entry> <entry>ulimit -l</entry> + <entry>Bytes</entry> </row> <row> - <entry>LimitLOCKS</entry> + <entry>LimitLOCKS=</entry> <entry>ulimit -x</entry> + <entry>Number of Locks</entry> </row> <row> - <entry>LimitSIGPENDING</entry> + <entry>LimitSIGPENDING=</entry> <entry>ulimit -i</entry> + <entry>Number of Queued Signals</entry> </row> <row> - <entry>LimitMSGQUEUE</entry> + <entry>LimitMSGQUEUE=</entry> <entry>ulimit -q</entry> + <entry>Bytes</entry> </row> <row> - <entry>LimitNICE</entry> + <entry>LimitNICE=</entry> <entry>ulimit -e</entry> + <entry>Nice Level</entry> </row> <row> - <entry>LimitRTPRIO</entry> + <entry>LimitRTPRIO=</entry> <entry>ulimit -r</entry> + <entry>Realtime Priority</entry> </row> <row> - <entry>LimitRTTIME</entry> + <entry>LimitRTTIME=</entry> <entry>No equivalent</entry> + <entry>Microseconds</entry> </row> </tbody> </tgroup> - </table> + </table></listitem> </varlistentry> <varlistentry> @@ -658,32 +829,40 @@ <varlistentry> <term><varname>CapabilityBoundingSet=</varname></term> - <listitem><para>Controls which capabilities to include in the - capability bounding set for the executed process. See - <citerefentry project='man-pages'><refentrytitle>capabilities</refentrytitle><manvolnum>7</manvolnum></citerefentry> - for details. Takes a whitespace-separated list of capability - names as read by - <citerefentry project='mankier'><refentrytitle>cap_from_name</refentrytitle><manvolnum>3</manvolnum></citerefentry>, - e.g. <constant>CAP_SYS_ADMIN</constant>, - <constant>CAP_DAC_OVERRIDE</constant>, - <constant>CAP_SYS_PTRACE</constant>. Capabilities listed will - be included in the bounding set, all others are removed. If - the list of capabilities is prefixed with - <literal>~</literal>, all but the listed capabilities will be - included, the effect of the assignment inverted. Note that - this option also affects the respective capabilities in the - effective, permitted and inheritable capability sets, on top - of what <varname>Capabilities=</varname> does. If this option - is not used, the capability bounding set is not modified on - process execution, hence no limits on the capabilities of the - process are enforced. This option may appear more than once in - which case the bounding sets are merged. If the empty string - is assigned to this option, the bounding set is reset to the - empty capability set, and all prior settings have no effect. - If set to <literal>~</literal> (without any further argument), - the bounding set is reset to the full set of available - capabilities, also undoing any previous - settings.</para></listitem> + <listitem><para>Controls which capabilities to include in the capability bounding set for the executed + process. See <citerefentry + project='man-pages'><refentrytitle>capabilities</refentrytitle><manvolnum>7</manvolnum></citerefentry> for + details. Takes a whitespace-separated list of capability names, e.g. <constant>CAP_SYS_ADMIN</constant>, + <constant>CAP_DAC_OVERRIDE</constant>, <constant>CAP_SYS_PTRACE</constant>. Capabilities listed will be + included in the bounding set, all others are removed. If the list of capabilities is prefixed with + <literal>~</literal>, all but the listed capabilities will be included, the effect of the assignment + inverted. Note that this option also affects the respective capabilities in the effective, permitted and + inheritable capability sets. If this option is not used, the capability bounding set is not modified on process + execution, hence no limits on the capabilities of the process are enforced. This option may appear more than + once, in which case the bounding sets are merged. If the empty string is assigned to this option, the bounding + set is reset to the empty capability set, and all prior settings have no effect. If set to + <literal>~</literal> (without any further argument), the bounding set is reset to the full set of available + capabilities, also undoing any previous settings. This does not affect commands prefixed with + <literal>+</literal>.</para></listitem> + </varlistentry> + + <varlistentry> + <term><varname>AmbientCapabilities=</varname></term> + + <listitem><para>Controls which capabilities to include in the ambient capability set for the executed + process. Takes a whitespace-separated list of capability names, e.g. <constant>CAP_SYS_ADMIN</constant>, + <constant>CAP_DAC_OVERRIDE</constant>, <constant>CAP_SYS_PTRACE</constant>. This option may appear more than + once in which case the ambient capability sets are merged. If the list of capabilities is prefixed with + <literal>~</literal>, all but the listed capabilities will be included, the effect of the assignment + inverted. If the empty string is assigned to this option, the ambient capability set is reset to the empty + capability set, and all prior settings have no effect. If set to <literal>~</literal> (without any further + argument), the ambient capability set is reset to the full set of available capabilities, also undoing any + previous settings. Note that adding capabilities to ambient capability set adds them to the process's inherited + capability set. </para><para> Ambient capability sets are useful if you want to execute a process as a + non-privileged user but still want to give it some capabilities. Note that in this case option + <constant>keep-caps</constant> is automatically added to <varname>SecureBits=</varname> to retain the + capabilities over the user change. <varname>AmbientCapabilities=</varname> does not affect commands prefixed + with <literal>+</literal>.</para></listitem> </varlistentry> <varlistentry> @@ -697,118 +876,87 @@ <option>no-setuid-fixup-locked</option>, <option>noroot</option>, and <option>noroot-locked</option>. - This option may appear more than once in which case the secure + This option may appear more than once, in which case the secure bits are ORed. If the empty string is assigned to this option, - the bits are reset to 0. See - <citerefentry project='man-pages'><refentrytitle>capabilities</refentrytitle><manvolnum>7</manvolnum></citerefentry> + the bits are reset to 0. This does not affect commands prefixed with <literal>+</literal>. + See <citerefentry project='man-pages'><refentrytitle>capabilities</refentrytitle><manvolnum>7</manvolnum></citerefentry> for details.</para></listitem> </varlistentry> <varlistentry> - <term><varname>Capabilities=</varname></term> - <listitem><para>Controls the - <citerefentry project='man-pages'><refentrytitle>capabilities</refentrytitle><manvolnum>7</manvolnum></citerefentry> - set for the executed process. Take a capability string - describing the effective, permitted and inherited capability - sets as documented in - <citerefentry project='mankier'><refentrytitle>cap_from_text</refentrytitle><manvolnum>3</manvolnum></citerefentry>. - Note that these capability sets are usually influenced (and - filtered) by the capabilities attached to the executed file. - Due to that <varname>CapabilityBoundingSet=</varname> is - probably a much more useful setting.</para></listitem> - </varlistentry> - - <varlistentry> - <term><varname>ReadWriteDirectories=</varname></term> - <term><varname>ReadOnlyDirectories=</varname></term> - <term><varname>InaccessibleDirectories=</varname></term> - - <listitem><para>Sets up a new file system namespace for - executed processes. These options may be used to limit access - a process might have to the main file system hierarchy. Each - setting takes a space-separated list of absolute directory - paths. Directories listed in - <varname>ReadWriteDirectories=</varname> are accessible from - within the namespace with the same access rights as from - outside. Directories listed in - <varname>ReadOnlyDirectories=</varname> are accessible for - reading only, writing will be refused even if the usual file - access controls would permit this. Directories listed in - <varname>InaccessibleDirectories=</varname> will be made - inaccessible for processes inside the namespace. Note that - restricting access with these options does not extend to - submounts of a directory that are created later on. These - options may be specified more than once in which case all - directories listed will have limited access from within the - namespace. If the empty string is assigned to this option, the - specific list is reset, and all prior assignments have no - effect.</para> - <para>Paths in - <varname>ReadOnlyDirectories=</varname> - and - <varname>InaccessibleDirectories=</varname> - may be prefixed with - <literal>-</literal>, in which case - they will be ignored when they do not - exist. Note that using this - setting will disconnect propagation of - mounts from the service to the host - (propagation in the opposite direction - continues to work). This means that - this setting may not be used for - services which shall be able to - install mount points in the main mount - namespace.</para></listitem> + <term><varname>ReadWritePaths=</varname></term> + <term><varname>ReadOnlyPaths=</varname></term> + <term><varname>InaccessiblePaths=</varname></term> + + <listitem><para>Sets up a new file system namespace for executed processes. These options may be used to limit + access a process might have to the file system hierarchy. Each setting takes a space-separated list of paths + relative to the host's root directory (i.e. the system running the service manager). Note that if paths + contain symlinks, they are resolved relative to the root directory set with + <varname>RootDirectory=</varname>.</para> + + <para>Paths listed in <varname>ReadWritePaths=</varname> are accessible from within the namespace with the same + access modes as from outside of it. Paths listed in <varname>ReadOnlyPaths=</varname> are accessible for + reading only, writing will be refused even if the usual file access controls would permit this. Nest + <varname>ReadWritePaths=</varname> inside of <varname>ReadOnlyPaths=</varname> in order to provide writable + subdirectories within read-only directories. Use <varname>ReadWritePaths=</varname> in order to whitelist + specific paths for write access if <varname>ProtectSystem=strict</varname> is used. Paths listed in + <varname>InaccessiblePaths=</varname> will be made inaccessible for processes inside the namespace (along with + everything below them in the file system hierarchy).</para> + + <para>Note that restricting access with these options does not extend to submounts of a directory that are + created later on. Non-directory paths may be specified as well. These options may be specified more than once, + in which case all paths listed will have limited access from within the namespace. If the empty string is + assigned to this option, the specific list is reset, and all prior assignments have no effect.</para> + + <para>Paths in <varname>ReadWritePaths=</varname>, <varname>ReadOnlyPaths=</varname> and + <varname>InaccessiblePaths=</varname> may be prefixed with <literal>-</literal>, in which case they will be ignored + when they do not exist. Note that using this setting will disconnect propagation of mounts from the service to + the host (propagation in the opposite direction continues to work). This means that this setting may not be used + for services which shall be able to install mount points in the main mount namespace. Note that the effect of + these settings may be undone by privileged processes. In order to set up an effective sandboxed environment for + a unit it is thus recommended to combine these settings with either + <varname>CapabilityBoundingSet=~CAP_SYS_ADMIN</varname> or <varname>SystemCallFilter=~@mount</varname>.</para></listitem> </varlistentry> <varlistentry> <term><varname>PrivateTmp=</varname></term> - <listitem><para>Takes a boolean argument. If true, sets up a - new file system namespace for the executed processes and - mounts private <filename>/tmp</filename> and - <filename>/var/tmp</filename> directories inside it that is - not shared by processes outside of the namespace. This is - useful to secure access to temporary files of the process, but - makes sharing between processes via <filename>/tmp</filename> - or <filename>/var/tmp</filename> impossible. If this is - enabled, all temporary files created by a service in these - directories will be removed after the service is stopped. - Defaults to false. It is possible to run two or more units - within the same private <filename>/tmp</filename> and - <filename>/var/tmp</filename> namespace by using the + <listitem><para>Takes a boolean argument. If true, sets up a new file system namespace for the executed + processes and mounts private <filename>/tmp</filename> and <filename>/var/tmp</filename> directories inside it + that is not shared by processes outside of the namespace. This is useful to secure access to temporary files of + the process, but makes sharing between processes via <filename>/tmp</filename> or <filename>/var/tmp</filename> + impossible. If this is enabled, all temporary files created by a service in these directories will be removed + after the service is stopped. Defaults to false. It is possible to run two or more units within the same + private <filename>/tmp</filename> and <filename>/var/tmp</filename> namespace by using the <varname>JoinsNamespaceOf=</varname> directive, see - <citerefentry><refentrytitle>systemd.unit</refentrytitle><manvolnum>5</manvolnum></citerefentry> - for details. Note that using this setting will disconnect - propagation of mounts from the service to the host - (propagation in the opposite direction continues to work). - This means that this setting may not be used for services - which shall be able to install mount points in the main mount - namespace.</para></listitem> + <citerefentry><refentrytitle>systemd.unit</refentrytitle><manvolnum>5</manvolnum></citerefentry> for + details. This setting is implied if <varname>DynamicUser=</varname> is set. For this setting the same + restrictions regarding mount propagation and privileges apply as for <varname>ReadOnlyPaths=</varname> and + related calls, see above.</para></listitem> + </varlistentry> <varlistentry> <term><varname>PrivateDevices=</varname></term> - <listitem><para>Takes a boolean argument. If true, sets up a - new /dev namespace for the executed processes and only adds - API pseudo devices such as <filename>/dev/null</filename>, - <filename>/dev/zero</filename> or - <filename>/dev/random</filename> (as well as the pseudo TTY - subsystem) to it, but no physical devices such as - <filename>/dev/sda</filename>. This is useful to securely turn - off physical device access by the executed process. Defaults - to false. Enabling this option will also remove - <constant>CAP_MKNOD</constant> from the capability bounding - set for the unit (see above), and set + <listitem><para>Takes a boolean argument. If true, sets up a new /dev namespace for the executed processes and + only adds API pseudo devices such as <filename>/dev/null</filename>, <filename>/dev/zero</filename> or + <filename>/dev/random</filename> (as well as the pseudo TTY subsystem) to it, but no physical devices such as + <filename>/dev/sda</filename>, system memory <filename>/dev/mem</filename>, system ports + <filename>/dev/port</filename> and others. This is useful to securely turn off physical device access by the + executed process. Defaults to false. Enabling this option will install a system call filter to block low-level + I/O system calls that are grouped in the <varname>@raw-io</varname> set, will also remove + <constant>CAP_MKNOD</constant> from the capability bounding set for the unit (see above), and set <varname>DevicePolicy=closed</varname> (see <citerefentry><refentrytitle>systemd.resource-control</refentrytitle><manvolnum>5</manvolnum></citerefentry> - for details). Note that using this setting will disconnect - propagation of mounts from the service to the host - (propagation in the opposite direction continues to work). - This means that this setting may not be used for services - which shall be able to install mount points in the main mount - namespace.</para></listitem> + for details). Note that using this setting will disconnect propagation of mounts from the service to the host + (propagation in the opposite direction continues to work). This means that this setting may not be used for + services which shall be able to install mount points in the main mount namespace. The /dev namespace will be + mounted read-only and 'noexec'. The latter may break old programs which try to set up executable memory by + using <citerefentry><refentrytitle>mmap</refentrytitle><manvolnum>2</manvolnum></citerefentry> of + <filename>/dev/zero</filename> instead of using <constant>MAP_ANON</constant>. This setting is implied if + <varname>DynamicUser=</varname> is set. For this setting the same restrictions regarding mount propagation and + privileges apply as for <varname>ReadOnlyPaths=</varname> and related calls, see above.</para></listitem> </varlistentry> <varlistentry> @@ -833,76 +981,104 @@ </varlistentry> <varlistentry> + <term><varname>PrivateUsers=</varname></term> + + <listitem><para>Takes a boolean argument. If true, sets up a new user namespace for the executed processes and + configures a minimal user and group mapping, that maps the <literal>root</literal> user and group as well as + the unit's own user and group to themselves and everything else to the <literal>nobody</literal> user and + group. This is useful to securely detach the user and group databases used by the unit from the rest of the + system, and thus to create an effective sandbox environment. All files, directories, processes, IPC objects and + other resources owned by users/groups not equaling <literal>root</literal> or the unit's own will stay visible + from within the unit but appear owned by the <literal>nobody</literal> user and group. If this mode is enabled, + all unit processes are run without privileges in the host user namespace (regardless if the unit's own + user/group is <literal>root</literal> or not). Specifically this means that the process will have zero process + capabilities on the host's user namespace, but full capabilities within the service's user namespace. Settings + such as <varname>CapabilityBoundingSet=</varname> will affect only the latter, and there's no way to acquire + additional capabilities in the host's user namespace. Defaults to off.</para> + + <para>This setting is particularly useful in conjunction with <varname>RootDirectory=</varname>, as the need to + synchronize the user and group databases in the root directory and on the host is reduced, as the only users + and groups who need to be matched are <literal>root</literal>, <literal>nobody</literal> and the unit's own + user and group.</para></listitem> + </varlistentry> + + <varlistentry> <term><varname>ProtectSystem=</varname></term> - <listitem><para>Takes a boolean argument or - <literal>full</literal>. If true, mounts the - <filename>/usr</filename> and <filename>/boot</filename> - directories read-only for processes invoked by this unit. If - set to <literal>full</literal>, the <filename>/etc</filename> - directory is mounted read-only, too. This setting ensures that - any modification of the vendor supplied operating system (and - optionally its configuration) is prohibited for the service. - It is recommended to enable this setting for all long-running - services, unless they are involved with system updates or need - to modify the operating system in other ways. Note however - that processes retaining the CAP_SYS_ADMIN capability can undo - the effect of this setting. This setting is hence particularly - useful for daemons which have this capability removed, for - example with <varname>CapabilityBoundingSet=</varname>. - Defaults to off.</para></listitem> + <listitem><para>Takes a boolean argument or the special values <literal>full</literal> or + <literal>strict</literal>. If true, mounts the <filename>/usr</filename> and <filename>/boot</filename> + directories read-only for processes invoked by this unit. If set to <literal>full</literal>, the + <filename>/etc</filename> directory is mounted read-only, too. If set to <literal>strict</literal> the entire + file system hierarchy is mounted read-only, except for the API file system subtrees <filename>/dev</filename>, + <filename>/proc</filename> and <filename>/sys</filename> (protect these directories using + <varname>PrivateDevices=</varname>, <varname>ProtectKernelTunables=</varname>, + <varname>ProtectControlGroups=</varname>). This setting ensures that any modification of the vendor-supplied + operating system (and optionally its configuration, and local mounts) is prohibited for the service. It is + recommended to enable this setting for all long-running services, unless they are involved with system updates + or need to modify the operating system in other ways. If this option is used, + <varname>ReadWritePaths=</varname> may be used to exclude specific directories from being made read-only. This + setting is implied if <varname>DynamicUser=</varname> is set. For this setting the same restrictions regarding + mount propagation and privileges apply as for <varname>ReadOnlyPaths=</varname> and related calls, see + above. Defaults to off.</para></listitem> </varlistentry> <varlistentry> <term><varname>ProtectHome=</varname></term> - <listitem><para>Takes a boolean argument or - <literal>read-only</literal>. If true, the directories - <filename>/home</filename>, <filename>/root</filename> and - <filename>/run/user</filename> - are made inaccessible and empty for processes invoked by this - unit. If set to <literal>read-only</literal>, the three - directories are made read-only instead. It is recommended to - enable this setting for all long-running services (in - particular network-facing ones), to ensure they cannot get - access to private user data, unless the services actually - require access to the user's private data. Note however that - processes retaining the CAP_SYS_ADMIN capability can undo the - effect of this setting. This setting is hence particularly - useful for daemons which have this capability removed, for - example with <varname>CapabilityBoundingSet=</varname>. - Defaults to off.</para></listitem> + <listitem><para>Takes a boolean argument or <literal>read-only</literal>. If true, the directories + <filename>/home</filename>, <filename>/root</filename> and <filename>/run/user</filename> are made inaccessible + and empty for processes invoked by this unit. If set to <literal>read-only</literal>, the three directories are + made read-only instead. It is recommended to enable this setting for all long-running services (in particular + network-facing ones), to ensure they cannot get access to private user data, unless the services actually + require access to the user's private data. This setting is implied if <varname>DynamicUser=</varname> is + set. For this setting the same restrictions regarding mount propagation and privileges apply as for + <varname>ReadOnlyPaths=</varname> and related calls, see above.</para></listitem> + </varlistentry> + + <varlistentry> + <term><varname>ProtectKernelTunables=</varname></term> + + <listitem><para>Takes a boolean argument. If true, kernel variables accessible through + <filename>/proc/sys</filename>, <filename>/sys</filename>, <filename>/proc/sysrq-trigger</filename>, + <filename>/proc/latency_stats</filename>, <filename>/proc/acpi</filename>, + <filename>/proc/timer_stats</filename>, <filename>/proc/fs</filename> and <filename>/proc/irq</filename> will + be made read-only to all processes of the unit. Usually, tunable kernel variables should only be written at + boot-time, with the <citerefentry><refentrytitle>sysctl.d</refentrytitle><manvolnum>5</manvolnum></citerefentry> + mechanism. Almost no services need to write to these at runtime; it is hence recommended to turn this on for + most services. For this setting the same restrictions regarding mount propagation and privileges apply as for + <varname>ReadOnlyPaths=</varname> and related calls, see above. Defaults to off.</para></listitem> + </varlistentry> + + <varlistentry> + <term><varname>ProtectControlGroups=</varname></term> + + <listitem><para>Takes a boolean argument. If true, the Linux Control Groups (<citerefentry + project='man-pages'><refentrytitle>cgroups</refentrytitle><manvolnum>7</manvolnum></citerefentry>) hierarchies + accessible through <filename>/sys/fs/cgroup</filename> will be made read-only to all processes of the + unit. Except for container managers no services should require write access to the control groups hierarchies; + it is hence recommended to turn this on for most services. For this setting the same restrictions regarding + mount propagation and privileges apply as for <varname>ReadOnlyPaths=</varname> and related calls, see + above. Defaults to off.</para></listitem> </varlistentry> <varlistentry> <term><varname>MountFlags=</varname></term> - <listitem><para>Takes a mount propagation flag: - <option>shared</option>, <option>slave</option> or - <option>private</option>, which control whether mounts in the - file system namespace set up for this unit's processes will - receive or propagate mounts or unmounts. See - <citerefentry project='man-pages'><refentrytitle>mount</refentrytitle><manvolnum>2</manvolnum></citerefentry> - for details. Defaults to <option>shared</option>. Use - <option>shared</option> to ensure that mounts and unmounts are - propagated from the host to the container and vice versa. Use - <option>slave</option> to run processes so that none of their - mounts and unmounts will propagate to the host. Use - <option>private</option> to also ensure that no mounts and - unmounts from the host will propagate into the unit processes' - namespace. Note that <option>slave</option> means that file - systems mounted on the host might stay mounted continuously in - the unit's namespace, and thus keep the device busy. Note that - the file system namespace related options - (<varname>PrivateTmp=</varname>, - <varname>PrivateDevices=</varname>, - <varname>ProtectSystem=</varname>, - <varname>ProtectHome=</varname>, - <varname>ReadOnlyDirectories=</varname>, - <varname>InaccessibleDirectories=</varname> and - <varname>ReadWriteDirectories=</varname>) require that mount - and unmount propagation from the unit's file system namespace - is disabled, and hence downgrade <option>shared</option> to + <listitem><para>Takes a mount propagation flag: <option>shared</option>, <option>slave</option> or + <option>private</option>, which control whether mounts in the file system namespace set up for this unit's + processes will receive or propagate mounts or unmounts. See <citerefentry + project='man-pages'><refentrytitle>mount</refentrytitle><manvolnum>2</manvolnum></citerefentry> for + details. Defaults to <option>shared</option>. Use <option>shared</option> to ensure that mounts and unmounts + are propagated from the host to the container and vice versa. Use <option>slave</option> to run processes so + that none of their mounts and unmounts will propagate to the host. Use <option>private</option> to also ensure + that no mounts and unmounts from the host will propagate into the unit processes' namespace. Note that + <option>slave</option> means that file systems mounted on the host might stay mounted continuously in the + unit's namespace, and thus keep the device busy. Note that the file system namespace related options + (<varname>PrivateTmp=</varname>, <varname>PrivateDevices=</varname>, <varname>ProtectSystem=</varname>, + <varname>ProtectHome=</varname>, <varname>ProtectKernelTunables=</varname>, + <varname>ProtectControlGroups=</varname>, <varname>ReadOnlyPaths=</varname>, + <varname>InaccessiblePaths=</varname>, <varname>ReadWritePaths=</varname>) require that mount and unmount + propagation from the unit's file system namespace is disabled, and hence downgrade <option>shared</option> to <option>slave</option>. </para></listitem> </varlistentry> @@ -910,10 +1086,16 @@ <term><varname>UtmpIdentifier=</varname></term> <listitem><para>Takes a four character identifier string for - an utmp/wtmp entry for this service. This should only be set - for services such as <command>getty</command> implementations + an <citerefentry + project='man-pages'><refentrytitle>utmp</refentrytitle><manvolnum>5</manvolnum></citerefentry> + and wtmp entry for this service. This should only be + set for services such as <command>getty</command> + implementations (such as <citerefentry + project='die-net'><refentrytitle>agetty</refentrytitle><manvolnum>8</manvolnum></citerefentry>) where utmp/wtmp entries must be created and cleared before and - after execution. If the configured string is longer than four + after execution, or for services that shall be executed as if + they were run by a <command>getty</command> process (see + below). If the configured string is longer than four characters, it is truncated and the terminal four characters are used. This setting interprets %I style string replacements. This setting is unset by default, i.e. no @@ -922,6 +1104,34 @@ </varlistentry> <varlistentry> + <term><varname>UtmpMode=</varname></term> + + <listitem><para>Takes one of <literal>init</literal>, + <literal>login</literal> or <literal>user</literal>. If + <varname>UtmpIdentifier=</varname> is set, controls which + type of <citerefentry + project='man-pages'><refentrytitle>utmp</refentrytitle><manvolnum>5</manvolnum></citerefentry>/wtmp + entries for this service are generated. This setting has no + effect unless <varname>UtmpIdentifier=</varname> is set + too. If <literal>init</literal> is set, only an + <constant>INIT_PROCESS</constant> entry is generated and the + invoked process must implement a + <command>getty</command>-compatible utmp/wtmp logic. If + <literal>login</literal> is set, first an + <constant>INIT_PROCESS</constant> entry, followed by a + <constant>LOGIN_PROCESS</constant> entry is generated. In + this case, the invoked process must implement a <citerefentry + project='die-net'><refentrytitle>login</refentrytitle><manvolnum>1</manvolnum></citerefentry>-compatible + utmp/wtmp logic. If <literal>user</literal> is set, first an + <constant>INIT_PROCESS</constant> entry, then a + <constant>LOGIN_PROCESS</constant> entry and finally a + <constant>USER_PROCESS</constant> entry is generated. In this + case, the invoked process may be any process that is suitable + to be run as session leader. Defaults to + <literal>init</literal>.</para></listitem> + </varlistentry> + + <varlistentry> <term><varname>SELinuxContext=</varname></term> <listitem><para>Set the SELinux security context of the @@ -929,8 +1139,8 @@ domain transition. However, the policy still needs to authorize the transition. This directive is ignored if SELinux is disabled. If prefixed by <literal>-</literal>, all errors - will be ignored. See - <citerefentry project='die-net'><refentrytitle>setexeccon</refentrytitle><manvolnum>3</manvolnum></citerefentry> + will be ignored. This does not affect commands prefixed with <literal>+</literal>. + See <citerefentry project='die-net'><refentrytitle>setexeccon</refentrytitle><manvolnum>3</manvolnum></citerefentry> for details.</para></listitem> </varlistentry> @@ -942,7 +1152,7 @@ Profiles must already be loaded in the kernel, or the unit will fail. This result in a non operation if AppArmor is not enabled. If prefixed by <literal>-</literal>, all errors will - be ignored. </para></listitem> + be ignored. This does not affect commands prefixed with <literal>+</literal>.</para></listitem> </varlistentry> <varlistentry> @@ -951,7 +1161,7 @@ <listitem><para>Takes a <option>SMACK64</option> security label as argument. The process executed by the unit will be started under this label and SMACK will decide whether the - processes is allowed to run or not based on it. The process + process is allowed to run or not, based on it. The process will continue to run under the label specified here unless the executable has its own <option>SMACK64EXEC</option> label, in which case the process will transition to run under that @@ -961,7 +1171,8 @@ <para>The value may be prefixed by <literal>-</literal>, in which case all errors will be ignored. An empty value may be - specified to unset previous assignments.</para> + specified to unset previous assignments. This does not affect + commands prefixed with <literal>+</literal>.</para> </listitem> </varlistentry> @@ -997,7 +1208,9 @@ first character of the list is <literal>~</literal>, the effect is inverted: only the listed system calls will result in immediate process termination (blacklisting). If running in - user mode and this option is used, + user mode, or in system mode, but without the + <constant>CAP_SYS_ADMIN</constant> capability (e.g. setting + <varname>User=nobody</varname>), <varname>NoNewPrivileges=yes</varname> is implied. This feature makes use of the Secure Computing Mode 2 interfaces of the kernel ('seccomp filtering') and is useful for enforcing a @@ -1007,10 +1220,10 @@ <function>sigreturn</function>, <function>exit_group</function>, <function>exit</function> system calls are implicitly whitelisted and do not need to be - listed explicitly. This option may be specified more than once + listed explicitly. This option may be specified more than once, in which case the filter masks are merged. If the empty string is assigned, the filter is reset, all prior assignments will - have no effect.</para> + have no effect. This does not affect commands prefixed with <literal>+</literal>.</para> <para>If you specify both types of this option (i.e. whitelisting and blacklisting), the first encountered will @@ -1023,7 +1236,92 @@ <function>read</function> and <function>write</function>, and right after it add a blacklisting of <function>write</function>, then <function>write</function> - will be removed from the set.) </para></listitem> + will be removed from the set.)</para> + + <para>As the number of possible system + calls is large, predefined sets of system calls are provided. + A set starts with <literal>@</literal> character, followed by + name of the set. + + <table> + <title>Currently predefined system call sets</title> + + <tgroup cols='2'> + <colspec colname='set' /> + <colspec colname='description' /> + <thead> + <row> + <entry>Set</entry> + <entry>Description</entry> + </row> + </thead> + <tbody> + <row> + <entry>@clock</entry> + <entry>System calls for changing the system clock (<citerefentry project='man-pages'><refentrytitle>adjtimex</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>settimeofday</refentrytitle><manvolnum>2</manvolnum></citerefentry>, and related calls)</entry> + </row> + <row> + <entry>@cpu-emulation</entry> + <entry>System calls for CPU emulation functionality (<citerefentry project='man-pages'><refentrytitle>vm86</refentrytitle><manvolnum>2</manvolnum></citerefentry> and related calls)</entry> + </row> + <row> + <entry>@debug</entry> + <entry>Debugging, performance monitoring and tracing functionality (<citerefentry project='man-pages'><refentrytitle>ptrace</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>perf_event_open</refentrytitle><manvolnum>2</manvolnum></citerefentry> and related calls)</entry> + </row> + <row> + <entry>@io-event</entry> + <entry>Event loop system calls (<citerefentry project='man-pages'><refentrytitle>poll</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>select</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>epoll</refentrytitle><manvolnum>7</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>eventfd</refentrytitle><manvolnum>2</manvolnum></citerefentry> and related calls)</entry> + </row> + <row> + <entry>@ipc</entry> + <entry>SysV IPC, POSIX Message Queues or other IPC (<citerefentry project='man-pages'><refentrytitle>mq_overview</refentrytitle><manvolnum>7</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>svipc</refentrytitle><manvolnum>7</manvolnum></citerefentry>)</entry> + </row> + <row> + <entry>@keyring</entry> + <entry>Kernel keyring access (<citerefentry project='man-pages'><refentrytitle>keyctl</refentrytitle><manvolnum>2</manvolnum></citerefentry> and related calls)</entry> + </row> + <row> + <entry>@module</entry> + <entry>Kernel module control (<citerefentry project='man-pages'><refentrytitle>init_module</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>delete_module</refentrytitle><manvolnum>2</manvolnum></citerefentry> and related calls)</entry> + </row> + <row> + <entry>@mount</entry> + <entry>File system mounting and unmounting (<citerefentry project='man-pages'><refentrytitle>mount</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>chroot</refentrytitle><manvolnum>2</manvolnum></citerefentry>, and related calls)</entry> + </row> + <row> + <entry>@network-io</entry> + <entry>Socket I/O (including local AF_UNIX): <citerefentry project='man-pages'><refentrytitle>socket</refentrytitle><manvolnum>7</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>unix</refentrytitle><manvolnum>7</manvolnum></citerefentry></entry> + </row> + <row> + <entry>@obsolete</entry> + <entry>Unusual, obsolete or unimplemented (<citerefentry project='man-pages'><refentrytitle>create_module</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>gtty</refentrytitle><manvolnum>2</manvolnum></citerefentry>, …)</entry> + </row> + <row> + <entry>@privileged</entry> + <entry>All system calls which need super-user capabilities (<citerefentry project='man-pages'><refentrytitle>capabilities</refentrytitle><manvolnum>7</manvolnum></citerefentry>)</entry> + </row> + <row> + <entry>@process</entry> + <entry>Process control, execution, namespaces (<citerefentry project='man-pages'><refentrytitle>execve</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>kill</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>namespaces</refentrytitle><manvolnum>7</manvolnum></citerefentry>, …</entry> + </row> + <row> + <entry>@raw-io</entry> + <entry>Raw I/O port access (<citerefentry project='man-pages'><refentrytitle>ioperm</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>iopl</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <function>pciconfig_read()</function>, …</entry> + </row> + </tbody> + </tgroup> + </table> + + Note, that as new system calls are added to the kernel, additional system calls might be added to the groups + above, so the contents of the sets may change between systemd versions.</para> + + <para>It is recommended to combine the file system namespacing related options with + <varname>SystemCallFilter=~@mount</varname>, in order to prohibit the unit's processes to undo the + mappings. Specifically these are the options <varname>PrivateTmp=</varname>, + <varname>PrivateDevices=</varname>, <varname>ProtectSystem=</varname>, <varname>ProtectHome=</varname>, + <varname>ProtectKernelTunables=</varname>, <varname>ProtectControlGroups=</varname>, + <varname>ReadOnlyPaths=</varname>, <varname>InaccessiblePaths=</varname> and + <varname>ReadWritePaths=</varname>.</para></listitem> </varlistentry> <varlistentry> @@ -1043,11 +1341,12 @@ <varlistentry> <term><varname>SystemCallArchitectures=</varname></term> - <listitem><para>Takes a space separated list of architecture + <listitem><para>Takes a space-separated list of architecture identifiers to include in the system call filter. The known architecture identifiers are <constant>x86</constant>, <constant>x86-64</constant>, <constant>x32</constant>, - <constant>arm</constant> as well as the special identifier + <constant>arm</constant>, <constant>s390</constant>, + <constant>s390x</constant> as well as the special identifier <constant>native</constant>. Only system calls of the specified architectures will be permitted to processes of this unit. This is an effective way to disable compatibility with @@ -1056,8 +1355,10 @@ systems. The special <constant>native</constant> identifier implicitly maps to the native architecture of the system (or more strictly: to the architecture the system manager is - compiled for). If running in user mode and this option is - used, <varname>NoNewPrivileges=yes</varname> is implied. Note + compiled for). If running in user mode, or in system mode, + but without the <constant>CAP_SYS_ADMIN</constant> + capability (e.g. setting <varname>User=nobody</varname>), + <varname>NoNewPrivileges=yes</varname> is implied. Note that setting this option to a non-empty list implies that <constant>native</constant> is included too. By default, this option is set to the empty list, i.e. no architecture system @@ -1086,8 +1387,10 @@ <function>socketpair()</function> (which creates connected AF_UNIX sockets only) are unaffected. Note that this option has no effect on 32-bit x86 and is ignored (but works - correctly on x86-64). If running in user mode and this option - is used, <varname>NoNewPrivileges=yes</varname> is implied. By + correctly on x86-64). If running in user mode, or in system + mode, but without the <constant>CAP_SYS_ADMIN</constant> + capability (e.g. setting <varname>User=nobody</varname>), + <varname>NoNewPrivileges=yes</varname> is implied. By default, no restriction applies, all address families are accessible to processes. If assigned the empty string, any previous list changes are undone.</para> @@ -1098,20 +1401,23 @@ family should be included in the configured whitelist as it is frequently used for local communication, including for <citerefentry><refentrytitle>syslog</refentrytitle><manvolnum>2</manvolnum></citerefentry> - logging.</para></listitem> + logging. This does not affect commands prefixed with <literal>+</literal>.</para></listitem> </varlistentry> <varlistentry> <term><varname>Personality=</varname></term> - <listitem><para>Controls which kernel architecture - <citerefentry project='man-pages'><refentrytitle>uname</refentrytitle><manvolnum>2</manvolnum></citerefentry> - shall report, when invoked by unit processes. Takes one of - <constant>x86</constant> and <constant>x86-64</constant>. This - is useful when running 32-bit services on a 64-bit host - system. If not specified, the personality is left unmodified - and thus reflects the personality of the host system's - kernel.</para></listitem> + <listitem><para>Controls which kernel architecture <citerefentry + project='man-pages'><refentrytitle>uname</refentrytitle><manvolnum>2</manvolnum></citerefentry> shall report, + when invoked by unit processes. Takes one of the architecture identifiers <constant>x86</constant>, + <constant>x86-64</constant>, <constant>ppc</constant>, <constant>ppc-le</constant>, <constant>ppc64</constant>, + <constant>ppc64-le</constant>, <constant>s390</constant> or <constant>s390x</constant>. Which personality + architectures are supported depends on the system architecture. Usually the 64bit versions of the various + system architectures support their immediate 32bit personality architecture counterpart, but no others. For + example, <constant>x86-64</constant> systems support the <constant>x86-64</constant> and + <constant>x86</constant> personalities but no others. The personality feature is useful when running 32-bit + services on a 64-bit host system. If not specified, the personality is left unmodified and thus reflects the + personality of the host system's kernel.</para></listitem> </varlistentry> <varlistentry> @@ -1140,6 +1446,35 @@ <citerefentry><refentrytitle>tmpfiles.d</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para></listitem> </varlistentry> + <varlistentry> + <term><varname>MemoryDenyWriteExecute=</varname></term> + + <listitem><para>Takes a boolean argument. If set, attempts to create memory mappings that are writable and + executable at the same time, or to change existing memory mappings to become executable are prohibited. + Specifically, a system call filter is added that rejects + <citerefentry><refentrytitle>mmap</refentrytitle><manvolnum>2</manvolnum></citerefentry> + system calls with both <constant>PROT_EXEC</constant> and <constant>PROT_WRITE</constant> set + and <citerefentry><refentrytitle>mprotect</refentrytitle><manvolnum>2</manvolnum></citerefentry> + system calls with <constant>PROT_EXEC</constant> set. Note that this option is incompatible with programs + that generate program code dynamically at runtime, such as JIT execution engines, or programs compiled making + use of the code "trampoline" feature of various C compilers. This option improves service security, as it makes + harder for software exploits to change running code dynamically. + </para></listitem> + </varlistentry> + + <varlistentry> + <term><varname>RestrictRealtime=</varname></term> + + <listitem><para>Takes a boolean argument. If set, any attempts to enable realtime scheduling in a process of + the unit are refused. This restricts access to realtime task scheduling policies such as + <constant>SCHED_FIFO</constant>, <constant>SCHED_RR</constant> or <constant>SCHED_DEADLINE</constant>. See + <citerefentry project='man-pages'><refentrytitle>sched</refentrytitle><manvolnum>7</manvolnum></citerefentry> for details about + these scheduling policies. Realtime scheduling policies may be used to monopolize CPU time for longer periods + of time, and may hence be used to lock up or otherwise trigger Denial-of-Service situations on the system. It + is hence recommended to restrict access to realtime scheduling to the few programs that actually require + them. Defaults to off.</para></listitem> + </varlistentry> + </variablelist> </refsect1> @@ -1190,6 +1525,16 @@ </varlistentry> <varlistentry> + <term><varname>$INVOCATION_ID</varname></term> + + <listitem><para>Contains a randomized, unique 128bit ID identifying each runtime cycle of the unit, formatted + as 32 character hexadecimal string. A new ID is assigned each time the unit changes from an inactive state into + an activating or active state, and may be used to identify this specific runtime cycle, in particular in data + stored offline, such as the journal. The same ID is passed to all processes run as part of the + unit.</para></listitem> + </varlistentry> + + <varlistentry> <term><varname>$XDG_RUNTIME_DIR</varname></term> <listitem><para>The directory for volatile state. Set for the @@ -1215,7 +1560,7 @@ <varlistentry> <term><varname>$MAINPID</varname></term> - <listitem><para>The PID of the units main process if it is + <listitem><para>The PID of the unit's main process if it is known. This is only set for control processes as invoked by <varname>ExecReload=</varname> and similar. </para></listitem> </varlistentry> @@ -1230,6 +1575,7 @@ <varlistentry> <term><varname>$LISTEN_FDS</varname></term> <term><varname>$LISTEN_PID</varname></term> + <term><varname>$LISTEN_FDNAMES</varname></term> <listitem><para>Information about file descriptors passed to a service for socket activation. See @@ -1238,6 +1584,24 @@ </varlistentry> <varlistentry> + <term><varname>$NOTIFY_SOCKET</varname></term> + + <listitem><para>The socket + <function>sd_notify()</function> talks to. See + <citerefentry><refentrytitle>sd_notify</refentrytitle><manvolnum>3</manvolnum></citerefentry>. + </para></listitem> + </varlistentry> + + <varlistentry> + <term><varname>$WATCHDOG_PID</varname></term> + <term><varname>$WATCHDOG_USEC</varname></term> + + <listitem><para>Information about watchdog keep-alive notifications. See + <citerefentry><refentrytitle>sd_watchdog_enabled</refentrytitle><manvolnum>3</manvolnum></citerefentry>. + </para></listitem> + </varlistentry> + + <varlistentry> <term><varname>$TERM</varname></term> <listitem><para>Terminal type, set only for units connected to @@ -1247,12 +1611,144 @@ <citerefentry project='man-pages'><refentrytitle>termcap</refentrytitle><manvolnum>5</manvolnum></citerefentry>. </para></listitem> </varlistentry> + + <varlistentry> + <term><varname>$JOURNAL_STREAM</varname></term> + + <listitem><para>If the standard output or standard error output of the executed processes are connected to the + journal (for example, by setting <varname>StandardError=journal</varname>) <varname>$JOURNAL_STREAM</varname> + contains the device and inode numbers of the connection file descriptor, formatted in decimal, separated by a + colon (<literal>:</literal>). This permits invoked processes to safely detect whether their standard output or + standard error output are connected to the journal. The device and inode numbers of the file descriptors should + be compared with the values set in the environment variable to determine whether the process output is still + connected to the journal. Note that it is generally not sufficient to only check whether + <varname>$JOURNAL_STREAM</varname> is set at all as services might invoke external processes replacing their + standard output or standard error output, without unsetting the environment variable.</para> + + <para>This environment variable is primarily useful to allow services to optionally upgrade their used log + protocol to the native journal protocol (using + <citerefentry><refentrytitle>sd_journal_print</refentrytitle><manvolnum>3</manvolnum></citerefentry> and other + functions) if their standard output or standard error output is connected to the journal anyway, thus enabling + delivery of structured metadata along with logged messages.</para></listitem> + </varlistentry> + + <varlistentry> + <term><varname>$SERVICE_RESULT</varname></term> + + <listitem><para>Only defined for the service unit type, this environment variable is passed to all + <varname>ExecStop=</varname> and <varname>ExecStopPost=</varname> processes, and encodes the service + "result". Currently, the following values are defined: <literal>timeout</literal> (in case of an operation + timeout), <literal>exit-code</literal> (if a service process exited with a non-zero exit code; see + <varname>$EXIT_CODE</varname> below for the actual exit code returned), <literal>signal</literal> (if a + service process was terminated abnormally by a signal; see <varname>$EXIT_CODE</varname> below for the actual + signal used for the termination), <literal>core-dump</literal> (if a service process terminated abnormally and + dumped core), <literal>watchdog</literal> (if the watchdog keep-alive ping was enabled for the service but it + missed the deadline), or <literal>resources</literal> (a catch-all condition in case a system operation + failed).</para> + + <para>This environment variable is useful to monitor failure or successful termination of a service. Even + though this variable is available in both <varname>ExecStop=</varname> and <varname>ExecStopPost=</varname>, it + is usually a better choice to place monitoring tools in the latter, as the former is only invoked for services + that managed to start up correctly, and the latter covers both services that failed during their start-up and + those which failed during their runtime.</para></listitem> + </varlistentry> + + <varlistentry> + <term><varname>$EXIT_CODE</varname></term> + <term><varname>$EXIT_STATUS</varname></term> + + <listitem><para>Only defined for the service unit type, these environment variables are passed to all + <varname>ExecStop=</varname>, <varname>ExecStopPost=</varname> processes and contain exit status/code + information of the main process of the service. For the precise definition of the exit code and status, see + <citerefentry><refentrytitle>wait</refentrytitle><manvolnum>2</manvolnum></citerefentry>. <varname>$EXIT_CODE</varname> + is one of <literal>exited</literal>, <literal>killed</literal>, + <literal>dumped</literal>. <varname>$EXIT_STATUS</varname> contains the numeric exit code formatted as string + if <varname>$EXIT_CODE</varname> is <literal>exited</literal>, and the signal name in all other cases. Note + that these environment variables are only set if the service manager succeeded to start and identify the main + process of the service.</para> + + <table> + <title>Summary of possible service result variable values</title> + <tgroup cols='3'> + <colspec colname='result' /> + <colspec colname='status' /> + <colspec colname='code' /> + <thead> + <row> + <entry><varname>$SERVICE_RESULT</varname></entry> + <entry><varname>$EXIT_STATUS</varname></entry> + <entry><varname>$EXIT_CODE</varname></entry> + </row> + </thead> + + <tbody> + <row> + <entry morerows="1" valign="top"><literal>timeout</literal></entry> + <entry valign="top"><literal>killed</literal></entry> + <entry><literal>TERM</literal>, <literal>KILL</literal></entry> + </row> + + <row> + <entry valign="top"><literal>exited</literal></entry> + <entry><literal>0</literal>, <literal>1</literal>, <literal>2</literal>, <literal + >3</literal>, …, <literal>255</literal></entry> + </row> + + <row> + <entry valign="top"><literal>exit-code</literal></entry> + <entry valign="top"><literal>exited</literal></entry> + <entry><literal>0</literal>, <literal>1</literal>, <literal>2</literal>, <literal + >3</literal>, …, <literal>255</literal></entry> + </row> + + <row> + <entry valign="top"><literal>signal</literal></entry> + <entry valign="top"><literal>killed</literal></entry> + <entry><literal>HUP</literal>, <literal>INT</literal>, <literal>KILL</literal>, …</entry> + </row> + + <row> + <entry valign="top"><literal>core-dump</literal></entry> + <entry valign="top"><literal>dumped</literal></entry> + <entry><literal>ABRT</literal>, <literal>SEGV</literal>, <literal>QUIT</literal>, …</entry> + </row> + + <row> + <entry morerows="2" valign="top"><literal>watchdog</literal></entry> + <entry><literal>dumped</literal></entry> + <entry><literal>ABRT</literal></entry> + </row> + <row> + <entry><literal>killed</literal></entry> + <entry><literal>TERM</literal>, <literal>KILL</literal></entry> + </row> + <row> + <entry><literal>exited</literal></entry> + <entry><literal>0</literal>, <literal>1</literal>, <literal>2</literal>, <literal + >3</literal>, …, <literal>255</literal></entry> + </row> + + <row> + <entry><literal>resources</literal></entry> + <entry>any of the above</entry> + <entry>any of the above</entry> + </row> + + <row> + <entry namest="results" nameend="code">Note: the process may be also terminated by a signal not sent by systemd. In particular the process may send an arbitrary signal to itself in a handler for any of the non-maskable signals. Nevertheless, in the <literal>timeout</literal> and <literal>watchdog</literal> rows above only the signals that systemd sends have been included.</entry> + </row> + </tbody> + </tgroup> + </table> + + </listitem> + </varlistentry> </variablelist> <para>Additional variables may be configured by the following means: for processes spawned in specific units, use the - <varname>Environment=</varname> and - <varname>EnvironmentFile=</varname> options above; to specify + <varname>Environment=</varname>, <varname>EnvironmentFile=</varname> + and <varname>PassEnvironment=</varname> options above; to specify variables globally, use <varname>DefaultEnvironment=</varname> (see <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>) @@ -1275,10 +1771,12 @@ <citerefentry><refentrytitle>systemd.mount</refentrytitle><manvolnum>5</manvolnum></citerefentry>, <citerefentry><refentrytitle>systemd.kill</refentrytitle><manvolnum>5</manvolnum></citerefentry>, <citerefentry><refentrytitle>systemd.resource-control</refentrytitle><manvolnum>5</manvolnum></citerefentry>, + <citerefentry><refentrytitle>systemd.time</refentrytitle><manvolnum>7</manvolnum></citerefentry>, <citerefentry><refentrytitle>systemd.directives</refentrytitle><manvolnum>7</manvolnum></citerefentry>, <citerefentry><refentrytitle>tmpfiles.d</refentrytitle><manvolnum>5</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>exec</refentrytitle><manvolnum>3</manvolnum></citerefentry> </para> </refsect1> + </refentry> |