summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--man/systemd.exec.xml214
1 files changed, 92 insertions, 122 deletions
diff --git a/man/systemd.exec.xml b/man/systemd.exec.xml
index 67182f17dc..84f81fe38e 100644
--- a/man/systemd.exec.xml
+++ b/man/systemd.exec.xml
@@ -877,48 +877,34 @@
<term><varname>ReadOnlyPaths=</varname></term>
<term><varname>InaccessiblePaths=</varname></term>
- <listitem><para>Sets up a new file system namespace for
- executed processes. These options may be used to limit access
- a process might have to the main file system hierarchy. Each
- setting takes a space-separated list of paths relative to
- the host's root directory (i.e. the system running the service manager).
- Note that if entries contain symlinks, they are resolved from the host's root directory as well.
- Entries (files or directories) listed in
- <varname>ReadWritePaths=</varname> are accessible from
- within the namespace with the same access rights as from
- outside. Entries listed in
- <varname>ReadOnlyPaths=</varname> are accessible for
- reading only, writing will be refused even if the usual file
- access controls would permit this. Entries listed in
- <varname>InaccessiblePaths=</varname> will be made
- inaccessible for processes inside the namespace, and may not
- countain any other mountpoints, including those specified by
- <varname>ReadWritePaths=</varname> or
- <varname>ReadOnlyPaths=</varname>.
- Note that restricting access with these options does not extend
- to submounts of a directory that are created later on.
- Non-directory paths can be specified as well. These
- options may be specified more than once, in which case all
- paths listed will have limited access from within the
- namespace. If the empty string is assigned to this option, the
- specific list is reset, and all prior assignments have no
- effect.</para>
- <para>Paths in
- <varname>ReadOnlyPaths=</varname>
- and
- <varname>InaccessiblePaths=</varname>
- may be prefixed with
- <literal>-</literal>, in which case
- they will be ignored when they do not
- exist. Note that using this
- setting will disconnect propagation of
- mounts from the service to the host
- (propagation in the opposite direction
- continues to work). This means that
- this setting may not be used for
- services which shall be able to
- install mount points in the main mount
- namespace.</para></listitem>
+ <listitem><para>Sets up a new file system namespace for executed processes. These options may be used to limit
+ access a process might have to the file system hierarchy. Each setting takes a space-separated list of paths
+ relative to the host's root directory (i.e. the system running the service manager). Note that if paths
+ contain symlinks, they are resolved relative to the root directory set with
+ <varname>RootDirectory=</varname>.</para>
+
+ <para>Paths listed in <varname>ReadWritePaths=</varname> are accessible from within the namespace with the same
+ access modes as from outside of it. Paths listed in <varname>ReadOnlyPaths=</varname> are accessible for
+ reading only, writing will be refused even if the usual file access controls would permit this. Nest
+ <varname>ReadWritePaths=</varname> inside of <varname>ReadOnlyPaths=</varname> in order to provide writable
+ subdirectories within read-only directories. Use <varname>ReadWritePaths=</varname> in order to whitelist
+ specific paths for write access if <varname>ProtectSystem=strict</varname> is used. Paths listed in
+ <varname>InaccessiblePaths=</varname> will be made inaccessible for processes inside the namespace (along with
+ everything below them in the file system hierarchy).</para>
+
+ <para>Note that restricting access with these options does not extend to submounts of a directory that are
+ created later on. Non-directory paths may be specified as well. These options may be specified more than once,
+ in which case all paths listed will have limited access from within the namespace. If the empty string is
+ assigned to this option, the specific list is reset, and all prior assignments have no effect.</para>
+
+ <para>Paths in <varname>ReadOnlyPaths=</varname> and <varname>InaccessiblePaths=</varname> may be prefixed with
+ <literal>-</literal>, in which case they will be ignored when they do not exist. Note that using this setting
+ will disconnect propagation of mounts from the service to the host (propagation in the opposite direction
+ continues to work). This means that this setting may not be used for services which shall be able to install
+ mount points in the main mount namespace. Note that the effect of these settings may be undone by privileged
+ processes. In order to set up an effective sandboxed environment for a unit it is thus recommended to combine
+ these settings with either <varname>CapabilityBoundingSet=~CAP_SYS_ADMIN</varname> or
+ <varname>SystemCallFilter=~@mount</varname>.</para></listitem>
</varlistentry>
<varlistentry>
@@ -933,37 +919,30 @@
private <filename>/tmp</filename> and <filename>/var/tmp</filename> namespace by using the
<varname>JoinsNamespaceOf=</varname> directive, see
<citerefentry><refentrytitle>systemd.unit</refentrytitle><manvolnum>5</manvolnum></citerefentry> for
- details. Note that using this setting will disconnect propagation of mounts from the service to the host
- (propagation in the opposite direction continues to work). This means that this setting may not be used for
- services which shall be able to install mount points in the main mount namespace. This setting is implied if
- <varname>DynamicUser=</varname> is set.</para></listitem>
+ details. This setting is implied if <varname>DynamicUser=</varname> is set. For this setting the same
+ restrictions regarding mount propagation and privileges apply as for <varname>ReadOnlyPaths=</varname> and
+ related calls, see above.</para></listitem>
+
</varlistentry>
<varlistentry>
<term><varname>PrivateDevices=</varname></term>
- <listitem><para>Takes a boolean argument. If true, sets up a
- new /dev namespace for the executed processes and only adds
- API pseudo devices such as <filename>/dev/null</filename>,
- <filename>/dev/zero</filename> or
- <filename>/dev/random</filename> (as well as the pseudo TTY
- subsystem) to it, but no physical devices such as
- <filename>/dev/sda</filename>. This is useful to securely turn
- off physical device access by the executed process. Defaults
- to false. Enabling this option will also remove
- <constant>CAP_MKNOD</constant> from the capability bounding
- set for the unit (see above), and set
- <varname>DevicePolicy=closed</varname> (see
+ <listitem><para>Takes a boolean argument. If true, sets up a new /dev namespace for the executed processes and
+ only adds API pseudo devices such as <filename>/dev/null</filename>, <filename>/dev/zero</filename> or
+ <filename>/dev/random</filename> (as well as the pseudo TTY subsystem) to it, but no physical devices such as
+ <filename>/dev/sda</filename>. This is useful to securely turn off physical device access by the executed
+ process. Defaults to false. Enabling this option will also remove <constant>CAP_MKNOD</constant> from the
+ capability bounding set for the unit (see above), and set <varname>DevicePolicy=closed</varname> (see
<citerefentry><refentrytitle>systemd.resource-control</refentrytitle><manvolnum>5</manvolnum></citerefentry>
- for details). Note that using this setting will disconnect
- propagation of mounts from the service to the host
- (propagation in the opposite direction continues to work).
- This means that this setting may not be used for services
- which shall be able to install mount points in the main mount
- namespace. The /dev namespace will be mounted read-only and 'noexec'.
- The latter may break old programs which try to set up executable
- memory by using <citerefentry><refentrytitle>mmap</refentrytitle><manvolnum>2</manvolnum></citerefentry>
- of <filename>/dev/zero</filename> instead of using <constant>MAP_ANON</constant>.</para></listitem>
+ for details). Note that using this setting will disconnect propagation of mounts from the service to the host
+ (propagation in the opposite direction continues to work). This means that this setting may not be used for
+ services which shall be able to install mount points in the main mount namespace. The /dev namespace will be
+ mounted read-only and 'noexec'. The latter may break old programs which try to set up executable memory by
+ using <citerefentry><refentrytitle>mmap</refentrytitle><manvolnum>2</manvolnum></citerefentry> of
+ <filename>/dev/zero</filename> instead of using <constant>MAP_ANON</constant>. This setting is implied if
+ <varname>DynamicUser=</varname> is set. For this setting the same restrictions regarding mount propagation and
+ privileges apply as for <varname>ReadOnlyPaths=</varname> and related calls, see above.</para></listitem>
</varlistentry>
<varlistentry>
@@ -1023,33 +1002,23 @@
operating system (and optionally its configuration, and local mounts) is prohibited for the service. It is
recommended to enable this setting for all long-running services, unless they are involved with system updates
or need to modify the operating system in other ways. If this option is used,
- <varname>ReadWritePaths=</varname> may be used to exclude specific directories from being made read-only. Note
- that processes retaining the <constant>CAP_SYS_ADMIN</constant> capability (and with no system call filter that
- prohibits mount-related system calls applied) can undo the effect of this setting. This setting is hence
- particularly useful for daemons which have this either the <literal>@mount</literal> set filtered using
- <varname>SystemCallFilter=</varname>, or have the <constant>CAP_SYS_ADMIN</constant> capability removed, for
- example with <varname>CapabilityBoundingSet=</varname>. Defaults to off.</para></listitem>
+ <varname>ReadWritePaths=</varname> may be used to exclude specific directories from being made read-only. This
+ setting is implied if <varname>DynamicUser=</varname> is set. For this setting the same restrictions regarding
+ mount propagation and privileges apply as for <varname>ReadOnlyPaths=</varname> and related calls, see
+ above. Defaults to off.</para></listitem>
</varlistentry>
<varlistentry>
<term><varname>ProtectHome=</varname></term>
- <listitem><para>Takes a boolean argument or
- <literal>read-only</literal>. If true, the directories
- <filename>/home</filename>, <filename>/root</filename> and
- <filename>/run/user</filename>
- are made inaccessible and empty for processes invoked by this
- unit. If set to <literal>read-only</literal>, the three
- directories are made read-only instead. It is recommended to
- enable this setting for all long-running services (in
- particular network-facing ones), to ensure they cannot get
- access to private user data, unless the services actually
- require access to the user's private data. Note however that
- processes retaining the CAP_SYS_ADMIN capability can undo the
- effect of this setting. This setting is hence particularly
- useful for daemons which have this capability removed, for
- example with <varname>CapabilityBoundingSet=</varname>.
- Defaults to off.</para></listitem>
+ <listitem><para>Takes a boolean argument or <literal>read-only</literal>. If true, the directories
+ <filename>/home</filename>, <filename>/root</filename> and <filename>/run/user</filename> are made inaccessible
+ and empty for processes invoked by this unit. If set to <literal>read-only</literal>, the three directories are
+ made read-only instead. It is recommended to enable this setting for all long-running services (in particular
+ network-facing ones), to ensure they cannot get access to private user data, unless the services actually
+ require access to the user's private data. This setting is implied if <varname>DynamicUser=</varname> is
+ set. For this setting the same restrictions regarding mount propagation and privileges apply as for
+ <varname>ReadOnlyPaths=</varname> and related calls, see above.</para></listitem>
</varlistentry>
<varlistentry>
@@ -1059,48 +1028,41 @@
<filename>/proc/sys</filename> and <filename>/sys</filename> will be made read-only to all processes of the
unit. Usually, tunable kernel variables should only be written at boot-time, with the
<citerefentry><refentrytitle>sysctl.d</refentrytitle><manvolnum>5</manvolnum></citerefentry> mechanism. Almost
- no services need to write to these at runtime; it is hence recommended to turn this on for most
- services. Defaults to off.</para></listitem>
+ no services need to write to these at runtime; it is hence recommended to turn this on for most services. For
+ this setting the same restrictions regarding mount propagation and privileges apply as for
+ <varname>ReadOnlyPaths=</varname> and related calls, see above. Defaults to off.</para></listitem>
</varlistentry>
<varlistentry>
<term><varname>ProtectControlGroups=</varname></term>
- <listitem><para>Takes a boolean argument. If true, the Linux Control Groups ("cgroups") hierarchies accessible
- through <filename>/sys/fs/cgroup</filename> will be made read-only to all processes of the unit. Except for
- container managers no services should require write access to the control groups hierarchies; it is hence
- recommended to turn this on for most services. Defaults to off.</para></listitem>
+ <listitem><para>Takes a boolean argument. If true, the Linux Control Groups (<citerefentry
+ project='man-pages'><refentrytitle>cgroups</refentrytitle><manvolnum>7</manvolnum></citerefentry>) hierarchies
+ accessible through <filename>/sys/fs/cgroup</filename> will be made read-only to all processes of the
+ unit. Except for container managers no services should require write access to the control groups hierarchies;
+ it is hence recommended to turn this on for most services. For this setting the same restrictions regarding
+ mount propagation and privileges apply as for <varname>ReadOnlyPaths=</varname> and related calls, see
+ above. Defaults to off.</para></listitem>
</varlistentry>
<varlistentry>
<term><varname>MountFlags=</varname></term>
- <listitem><para>Takes a mount propagation flag:
- <option>shared</option>, <option>slave</option> or
- <option>private</option>, which control whether mounts in the
- file system namespace set up for this unit's processes will
- receive or propagate mounts or unmounts. See
- <citerefentry project='man-pages'><refentrytitle>mount</refentrytitle><manvolnum>2</manvolnum></citerefentry>
- for details. Defaults to <option>shared</option>. Use
- <option>shared</option> to ensure that mounts and unmounts are
- propagated from the host to the container and vice versa. Use
- <option>slave</option> to run processes so that none of their
- mounts and unmounts will propagate to the host. Use
- <option>private</option> to also ensure that no mounts and
- unmounts from the host will propagate into the unit processes'
- namespace. Note that <option>slave</option> means that file
- systems mounted on the host might stay mounted continuously in
- the unit's namespace, and thus keep the device busy. Note that
- the file system namespace related options
- (<varname>PrivateTmp=</varname>,
- <varname>PrivateDevices=</varname>,
- <varname>ProtectSystem=</varname>,
- <varname>ProtectHome=</varname>,
- <varname>ReadOnlyPaths=</varname>,
- <varname>InaccessiblePaths=</varname> and
- <varname>ReadWritePaths=</varname>) require that mount
- and unmount propagation from the unit's file system namespace
- is disabled, and hence downgrade <option>shared</option> to
+ <listitem><para>Takes a mount propagation flag: <option>shared</option>, <option>slave</option> or
+ <option>private</option>, which control whether mounts in the file system namespace set up for this unit's
+ processes will receive or propagate mounts or unmounts. See <citerefentry
+ project='man-pages'><refentrytitle>mount</refentrytitle><manvolnum>2</manvolnum></citerefentry> for
+ details. Defaults to <option>shared</option>. Use <option>shared</option> to ensure that mounts and unmounts
+ are propagated from the host to the container and vice versa. Use <option>slave</option> to run processes so
+ that none of their mounts and unmounts will propagate to the host. Use <option>private</option> to also ensure
+ that no mounts and unmounts from the host will propagate into the unit processes' namespace. Note that
+ <option>slave</option> means that file systems mounted on the host might stay mounted continuously in the
+ unit's namespace, and thus keep the device busy. Note that the file system namespace related options
+ (<varname>PrivateTmp=</varname>, <varname>PrivateDevices=</varname>, <varname>ProtectSystem=</varname>,
+ <varname>ProtectHome=</varname>, <varname>ProtectKernelTunables=</varname>,
+ <varname>ProtectControlGroups=</varname>, <varname>ReadOnlyPaths=</varname>,
+ <varname>InaccessiblePaths=</varname>, <varname>ReadWritePaths=</varname>) require that mount and unmount
+ propagation from the unit's file system namespace is disabled, and hence downgrade <option>shared</option> to
<option>slave</option>. </para></listitem>
</varlistentry>
@@ -1335,7 +1297,15 @@
</table>
Note, that as new system calls are added to the kernel, additional system calls might be added to the groups
- above, so the contents of the sets may change between systemd versions.</para></listitem>
+ above, so the contents of the sets may change between systemd versions.</para>
+
+ <para>It is recommended to combine the file system namespacing related options with
+ <varname>SystemCallFilter=~@mount</varname>, in order to prohibit the unit's processes to undo the
+ mappings. Specifically these are the options <varname>PrivateTmp=</varname>,
+ <varname>PrivateDevices=</varname>, <varname>ProtectSystem=</varname>, <varname>ProtectHome=</varname>,
+ <varname>ProtectKernelTunables=</varname>, <varname>ProtectControlGroups=</varname>,
+ <varname>ReadOnlyPaths=</varname>, <varname>InaccessiblePaths=</varname> and
+ <varname>ReadWritePaths=</varname>.</para></listitem>
</varlistentry>
<varlistentry>