summaryrefslogtreecommitdiff
path: root/man
diff options
context:
space:
mode:
authorLennart Poettering <lennart@poettering.net>2016-11-01 20:25:19 -0600
committerLennart Poettering <lennart@poettering.net>2016-11-04 07:40:13 -0600
commitadd005357d535681c7075ced8eec2b6e61b43728 (patch)
treeb780280f06df0b09c738173602cb90c599597996 /man
parent9156493171cf2d78e1ac1a3746c385b0e281acf1 (diff)
core: add new RestrictNamespaces= unit file setting
This new setting permits restricting whether namespaces may be created and managed by processes started by a unit. It installs a seccomp filter blocking certain invocations of unshare(), clone() and setns(). RestrictNamespaces=no is the default, and does not restrict namespaces in any way. RestrictNamespaces=yes takes away the ability to create or manage any kind of namspace. "RestrictNamespaces=mnt ipc" restricts the creation of namespaces so that only mount and IPC namespaces may be created/managed, but no other kind of namespaces. This setting should be improve security quite a bit as in particular user namespacing was a major source of CVEs in the kernel in the past, and is accessible to unprivileged processes. With this setting the entire attack surface may be removed for system services that do not make use of namespaces.
Diffstat (limited to 'man')
-rw-r--r--man/systemd.exec.xml50
1 files changed, 34 insertions, 16 deletions
diff --git a/man/systemd.exec.xml b/man/systemd.exec.xml
index 0973f4047a..3b39a9c912 100644
--- a/man/systemd.exec.xml
+++ b/man/systemd.exec.xml
@@ -1234,22 +1234,16 @@
<varlistentry>
<term><varname>NoNewPrivileges=</varname></term>
- <listitem><para>Takes a boolean argument. If true, ensures that the service
- process and all its children can never gain new privileges through
- <function>execve</function> (e.g. via setuid or setgid bits, or filesystem
- capabilities). This is the simplest and most effective way to ensure that
- a process and its children can never elevate privileges again. Defaults to false,
- but in the user manager instance certain settings force
- <varname>NoNewPrivileges=yes</varname>, ignoring the value of this setting.
- This is the case when <varname>SystemCallFilter=</varname>,
- <varname>SystemCallArchitectures=</varname>,
- <varname>RestrictAddressFamilies=</varname>,
- <varname>PrivateDevices=</varname>,
- <varname>ProtectKernelTunables=</varname>,
- <varname>ProtectKernelModules=</varname>,
- <varname>MemoryDenyWriteExecute=</varname>, or
- <varname>RestrictRealtime=</varname> are specified.
- </para></listitem>
+ <listitem><para>Takes a boolean argument. If true, ensures that the service process and all its children can
+ never gain new privileges through <function>execve()</function> (e.g. via setuid or setgid bits, or filesystem
+ capabilities). This is the simplest and most effective way to ensure that a process and its children can never
+ elevate privileges again. Defaults to false, but in the user manager instance certain settings force
+ <varname>NoNewPrivileges=yes</varname>, ignoring the value of this setting. This is the case when
+ <varname>SystemCallFilter=</varname>, <varname>SystemCallArchitectures=</varname>,
+ <varname>RestrictAddressFamilies=</varname>, <varname>RestrictNamespaces=</varname>,
+ <varname>PrivateDevices=</varname>, <varname>ProtectKernelTunables=</varname>,
+ <varname>ProtectKernelModules=</varname>, <varname>MemoryDenyWriteExecute=</varname>, or
+ <varname>RestrictRealtime=</varname> are specified.</para></listitem>
</varlistentry>
<varlistentry>
@@ -1468,6 +1462,30 @@
</varlistentry>
<varlistentry>
+ <term><varname>RestrictNamespaces=</varname></term>
+
+ <listitem><para>Restricts access to Linux namespace functionality for the processes of this unit. For details
+ about Linux namespaces, see
+ <citerefentry><refentrytitle>namespaces</refentrytitle><manvolnum>7</manvolnum></citerefentry>. Either takes a
+ boolean argument, or a space-separated list of namespace type identifiers. If false (the default), no
+ restrictions on namespace creation and switching are made. If true, access to any kind of namespacing is
+ prohibited. Otherwise, a space-separated list of namespace type identifiers must be specified, consisting of
+ any combination of: <constant>cgroup</constant>, <constant>ipc</constant>, <constant>net</constant>,
+ <constant>mnt</constant>, <constant>pid</constant>, <constant>user</constant> and <constant>uts</constant>. Any
+ namespace type listed is made accessible to the unit's processes, access to namespace types not listed is
+ prohibited (whitelisting). By prepending the list with a single tilda character (<literal>~</literal>) the
+ effect may be inverted: only the listed namespace types will be made inaccessible, all unlisted ones are
+ permitted (blacklisting). If the empty string is assigned, the default namespace restrictions are applied,
+ which is equivalent to false. Internally, this setting limits access to the
+ <citerefentry><refentrytitle>unshare</refentrytitle><manvolnum>2</manvolnum></citerefentry>,
+ <citerefentry><refentrytitle>clone</refentrytitle><manvolnum>2</manvolnum></citerefentry> and
+ <citerefentry><refentrytitle>setns</refentrytitle><manvolnum>2</manvolnum></citerefentry> system calls, taking
+ the specified flags parameters into account. Note that — if this option is used — in addition to restricting
+ creation and switching of the specified types of namespaces (or all of them, if true) access to the
+ <function>setns()</function> system call with a zero flags parameter is prohibited.</para></listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><varname>ProtectKernelModules=</varname></term>
<listitem><para>Takes a boolean argument. If true, explicit module loading will