Age | Commit message (Collapse) | Author |
|
@filesystem groups various file system operations, such as opening files and
directories for read/write and stat()ing them, plus renaming, deleting,
symlinking, hardlinking.
|
|
Confirm spawn fixes/enhancements
|
|
It's rather hard to parse the confirmation messages (enabled with
systemd.confirm_spawn=true) amongst the status messages and the kernel
ones (if enabled).
This patch gives the possibility to the user to redirect the confirmation
message to a different virtual console, either by giving its name or its path,
so those messages are separated from the other ones and easier to read.
|
|
This changes a couple of things in the namespace handling:
It merges the BindMount and TargetMount structures. They are mostly the same,
hence let's just use the same structue, and rely on C's implicit zero
initialization of partially initialized structures for the unneeded fields.
This reworks memory management of each entry a bit. It now contains one "const"
and one "malloc" path. We use the former whenever we can, but use the latter
when we have to, which is the case when we have to chase symlinks or prefix a
root directory. This means in the common case we don't actually need to
allocate any dynamic memory. To make this easy to use we add an accessor
function bind_mount_path() which retrieves the right path string from a
BindMount structure.
While we are at it, also permit "+" as prefix for dirs configured with
ReadOnlyPaths= and friends: if specified the root directory of the unit is
implicited prefixed.
This also drops set_bind_mount() and uses C99 structure initialization instead,
which I think is more readable and clarifies what is being done.
This drops append_protect_kernel_tunables() and
append_protect_kernel_modules() as append_static_mounts() is now simple enough
to be called directly.
Prefixing with the root dir is now done in an explicit step in
prefix_where_needed(). It will prepend the root directory on each entry that
doesn't have it prefixed yet. The latter is determined depending on an extra
bit in the BindMount structure.
|
|
called
Let's expose the new bus functions we added in the previous commit in
systemctl.
|
|
|
|
|
|
This GUID was added in #2263, but the manpage was not updated.
|
|
This adds a new systemd fstab option x-systemd.mount-timeout. The option
adds a timeout value that specifies how long systemd waits for the mount
command to finish. It allows to mount huge btrfs volumes without issues.
This is equivalent to adding option TimeoutSec= to [Mount] section in a
mount unit file.
fixes #4055
|
|
|
|
Allow setting custom port for the DHCP client to listen on in networkd.
[DHCP]
ListenPort=6677
|
|
|
|
Link: port to new ethtool ETHTOOL_xLINKSETTINGS
This patch defines a new ETHTOOL_GLINKSETTINGS/SLINKSETTINGS API,
handled by the new get_link_ksettings/set_link_ksettings .
This is a WIP version based on this [kernel
patch](https://patchwork.kernel.org/patch/8411401/).
commit 0527f1c
http://github.com/torvalds/linux/commit/3f1ac7a700d039c61d8d8b99f28d605d489a60cfommit
35afb33
|
|
|
|
core: add new RestrictNamespaces= unit file setting
Merging, not rebasing, because this touches many files and there were tree-wide cleanups in the mean time.
|
|
|
|
It was logical, but not entirely obvious, that 'e' with no arguments does
nothing. Expand the explanation a bit and add an example.
Fixes #4564.
|
|
This new setting permits restricting whether namespaces may be created and
managed by processes started by a unit. It installs a seccomp filter blocking
certain invocations of unshare(), clone() and setns().
RestrictNamespaces=no is the default, and does not restrict namespaces in any
way. RestrictNamespaces=yes takes away the ability to create or manage any kind
of namspace. "RestrictNamespaces=mnt ipc" restricts the creation of namespaces
so that only mount and IPC namespaces may be created/managed, but no other
kind of namespaces.
This setting should be improve security quite a bit as in particular user
namespacing was a major source of CVEs in the kernel in the past, and is
accessible to unprivileged processes. With this setting the entire attack
surface may be removed for system services that do not make use of namespaces.
|
|
|
|
systemd-analyze syscall-filter
|
|
Setting no_new_privs does not stop UID changes, but rather blocks
gaining privileges through execve(). Also fixes a small typo.
|
|
Just to make the whole thing easier for users.
|
|
This should make it easier for users to understand what each filter
means as the list of syscalls is updated in subsequent systemd versions.
|
|
|
|
Preserve stored fds over service restart
|
|
If execve() or socket() is filtered the service manager might get into trouble
executing the service binary, or handling any failures when this fails. Mention
this in the documentation.
The other option would be to implicitly whitelist all system calls that are
required for these codepaths. However, that appears less than desirable as this
would mean socket() and many related calls have to be whitelisted
unconditionally. As writing system call filters requires a certain level of
expertise anyway it sounds like the better option to simply document these
issues and suggest that the user disables system call filters in the service
temporarily in order to debug any such failures.
See: #3993.
|
|
@resources contains various syscalls that alter resource limits and memory and
scheduling parameters of processes. As such they are good candidates to block
for most services.
@basic-io contains a number of basic syscalls for I/O, similar to the list
seccomp v1 permitted but slightly more complete. It should be useful for
building basic whitelisting for minimal sandboxes
|
|
|
|
These system calls clearly fall in the @ipc category, hence should be listed
there, simply to avoid confusion and surprise by the user.
|
|
The system call is already part in @default hence implicitly allowed anyway.
Also, if it is actually blocked then systemd couldn't execute the service in
question anymore, since the application of seccomp is immediately followed by
it.
|
|
Timing and sleep are so basic operations, it makes very little sense to ever
block them, hence don't.
|
|
"Secondary arch" table for mips is entirely speculative…
|
|
|
|
This introduces a new option, `tcrypt-veracrypt`, that sets the
corresponding VeraCrypt flag in the flags passed to cryptsetup.
|
|
The first example wasn't phrased with "To ..." as the other three are,
and the last example was lacking the colon.
|
|
|
|
The option does more than the documentation gave it credit for.
|
|
Let's say that this was not obvious from our man page.
|
|
seccomp: also block shmat(..., SHM_EXEC) for MemoryDenyWriteExecute
|
|
Document NoNewPrivileges default value
|
|
|
|
detect-virt: add --private-users switch to check if a userns is active; add Condition=private-users
|
|
Various things don't work when we're running in a user namespace, but it's
pretty hard to reliably detect if that is true.
A function is added which looks at /proc/self/uid_map and returns false
if the default "0 0 UINT32_MAX" is found, and true if it finds anything else.
This misses the case where an 1:1 mapping with the full range was used, but
I don't know how to distinguish this case.
'systemd-detect-virt --private-users' is very similar to
'systemd-detect-virt --chroot', but we check for a user namespace instead.
|
|
To more correctly reflect current behaviour as well as to provide
a few more details.
|
|
shmat(..., SHM_EXEC) can be used to create writable and executable
memory, so let's block it when MemoryDenyWriteExecute is set.
|
|
... and that that content might be outdated.
|
|
various nss module/resolved fixes
|
|
Fixes #4329.
|
|
|
|
This unifies the suggested nsswitch.conf configuration for our four NSS modules to this:
hosts: files mymachines resolve [!UNAVAIL=return] dns myhostname
Note that this restores "myhostname" to the suggested configuration of
nss-resolve for the time being, undoing 4484e1792b64b01614f04b7bde97bf019f601bf9.
"myhostname" should probably be dropped eventually, but when we do this we
should do it in full, and not only drop it from the suggested nsswitch.conf
for one of the modules, but also drop it in source and stop referring to it
altogether.
Note that nss-resolve doesn't replace nss-myhostname in full: the former only
works if D-Bus/resolved is available for resolving the local hostname, the
latter works in all cases even if D-Bus or resolved are not in operation, hence
there's some value in keeping the line as it is right now. Note that neither
dns nor myhostname are considered at all with the above configuration unless
the resolve module actually returns UNAVAIL. Thus, even though handling of
local hostname resolving is implemented twice this way it is only executed once
for each lookup.
|