summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2016-09-25core:sandbox: add more /proc/* entries to ProtectKernelTunables=Djalal Harouni
Make ALSA entries, latency interface, mtrr, apm/acpi, suspend interface, filesystems configuration and IRQ tuning readonly. Most of these interfaces now days should be in /sys but they are still available through /proc, so just protect them. This patch does not touch /proc/net/...
2016-09-25core:namespace: simplify mount calculationDjalal Harouni
Move out mount calculation on its own function. Actually the logic is smart enough to later drop nop and duplicates mounts, this change improves code readability. --- src/core/namespace.c | 47 ++++++++++++++++++++++++++++++++++++----------- 1 file changed, 36 insertions(+), 11 deletions(-)
2016-09-25core:namespace: put paths protected by ProtectKernelTunables= inDjalal Harouni
Instead of having all these paths everywhere, put the ones that are protected by ProtectKernelTunables= into their own table. This way it is easy to add paths and track which ones are protected.
2016-09-25core:namespace: minor improvements to append_mounts()Djalal Harouni
2016-09-25execute: move SMACK setup code into its own functionLennart Poettering
While we are at it, move PAM code #ifdeffery into setup_pam() to simplify the main execution logic a bit.
2016-09-25namespace: drop all mounts outside of the new root directoryLennart Poettering
There's no point in mounting these, if they are outside of the root directory we'll move to.
2016-09-25main: minor simplificationLennart Poettering
2016-09-25execute: filter low-level I/O syscalls if PrivateDevices= is setLennart Poettering
If device access is restricted via PrivateDevices=, let's also block the various low-level I/O syscalls at the same time, so that we know that the minimal set of devices in our virtualized /dev are really everything the unit can access.
2016-09-25namespace: don't make the root directory of a namespace a mount if it ↵Lennart Poettering
already is one Let's not stack mounts needlessly.
2016-09-25namespace: chase symlinks for mounts to set up in userspaceLennart Poettering
This adds logic to chase symlinks for all mount points that shall be created in a namespace environment in userspace, instead of leaving this to the kernel. This has the advantage that we can correctly handle absolute symlinks that shall be taken relative to a specific root directory. Moreover, we can properly handle mounts created on symlinked files or directories as we can merge their mounts as necessary. (This also drops the "done" flag in the namespace logic, which was never actually working, but was supposed to permit a partial rollback of the namespace logic, which however is only mildly useful as it wasn't clear in which case it would or would not be able to roll back.) Fixes: #3867
2016-09-25namespace: invoke unshare() only after checking all parametersLennart Poettering
Let's create the new namespace only after we validated and processed all parameters, right before we start with actually mounting things. This way, the window where we can roll back is larger (not that it matters IRL...)
2016-09-25execute: drop group priviliges only after setting up namespaceLennart Poettering
If PrivateDevices=yes is set, the namespace code creates device nodes in /dev that should be owned by the host's root, hence let's make sure we set up the namespace before dropping group privileges.
2016-09-25nspawn: let's mount /proc/sysrq-trigger read-only by defaultLennart Poettering
LXC does this, and we should probably too. Better safe than sorry.
2016-09-25core: imply ProtectHome=read-only and ProtectSystem=strict if DynamicUser=1Lennart Poettering
Let's make sure that services that use DynamicUser=1 cannot leave files in the file system should the system accidentally have a world-writable directory somewhere. This effectively ensures that directories need to be whitelisted rather than blacklisted for access when DynamicUser=1 is set.
2016-09-25core: introduce ProtectSystem=strictLennart Poettering
Let's tighten our sandbox a bit more: with this change ProtectSystem= gains a new setting "strict". If set, the entire directory tree of the system is mounted read-only, but the API file systems /proc, /dev, /sys are excluded (they may be managed with PrivateDevices= and ProtectKernelTunables=). Also, /home and /root are excluded as those are left for ProtectHome= to manage. In this mode, all "real" file systems (i.e. non-API file systems) are mounted read-only, and specific directories may only be excluded via ReadWriteDirectories=, thus implementing an effective whitelist instead of blacklist of writable directories. While we are at, also add /efi to the list of paths always affected by ProtectSystem=. This is a follow-up for b52a109ad38cd37b660ccd5394ff5c171a5e5355 which added /efi as alternative for /boot. Our namespacing logic should respect that too.
2016-09-25namespace: add some debug logging when enforcing InaccessiblePaths=Lennart Poettering
2016-09-25namespace: rework how ReadWritePaths= is appliedLennart Poettering
Previously, if ReadWritePaths= was nested inside a ReadOnlyPaths= specification, then we'd first recursively apply the ReadOnlyPaths= paths, and make everything below read-only, only in order to then flip the read-only bit again for the subdirs listed in ReadWritePaths= below it. This is not only ugly (as for the dirs in question we first turn on the RO bit, only to turn it off again immediately after), but also problematic in containers, where a container manager might have marked a set of dirs read-only and this code will undo this is ReadWritePaths= is set for any. With this patch behaviour in this regard is altered: ReadOnlyPaths= will not be applied to the children listed in ReadWritePaths= in the first place, so that we do not need to turn off the RO bit for those after all. This means that ReadWritePaths=/ReadOnlyPaths= may only be used to turn on the RO bit, but never to turn it off again. Or to say this differently: if some dirs are marked read-only via some external tool, then ReadWritePaths= will not undo it. This is not only the safer option, but also more in-line with what the man page currently claims: "Entries (files or directories) listed in ReadWritePaths= are accessible from within the namespace with the same access rights as from outside." To implement this change bind_remount_recursive() gained a new "blacklist" string list parameter, which when passed may contain subdirs that shall be excluded from the read-only mounting. A number of functions are updated to add more debug logging to make this more digestable.
2016-09-25namespace: when enforcing fs namespace restrictions suppress redundant mountsLennart Poettering
If /foo is marked to be read-only, and /foo/bar too, then the latter may be suppressed as it has no effect.
2016-09-25namespace: simplify mount_path_compare() a bitLennart Poettering
2016-09-25execute: if RuntimeDirectory= is set, it should be writableLennart Poettering
Implicitly make all dirs set with RuntimeDirectory= writable, as the concept otherwise makes no sense.
2016-09-25execute: move suppression of HOME=/ and SHELL=/bin/nologin into user-util.cLennart Poettering
This adds a new call get_user_creds_clean(), which is just like get_user_creds() but returns NULL in the home/shell parameters if they contain no useful information. This code previously lived in execute.c, but by generalizing this we can reuse it in run.c.
2016-09-25execute: split out creation of runtime dirs into its own functionsLennart Poettering
2016-09-25namespace: make sure InaccessibleDirectories= masks all mounts further downLennart Poettering
If a dir is marked to be inaccessible then everything below it should be masked by it.
2016-09-25core: add two new service settings ProtectKernelTunables= and ↵Lennart Poettering
ProtectControlGroups= If enabled, these will block write access to /sys, /proc/sys and /proc/sys/fs/cgroup.
2016-09-25core: enforce seccomp for secondary archs too, for all rulesLennart Poettering
Let's make sure that all our rules apply to all archs the local kernel supports.
2016-09-24Merge pull request #4182 from jkoelker/routetableZbigniew Jędrzejewski-Szmek
2016-09-24networkd: do not drop config for pending interfaces (#4187)Martin Pitt
While an interface is still being processed by udev, it is in state "pending", instead of "unmanaged". We must not flush device configuration then. Further fixes commit 3104883ddc24 after commit c436d55397. Fixes #4186
2016-09-24kernel-install: allow plugins to terminate the procedure (#4174)Zbigniew Jędrzejewski-Szmek
Replaces #4103.
2016-09-24Merge pull request #4207 from fbuihuu/fix-journal-hmac-calculationZbigniew Jędrzejewski-Szmek
Fix journal hmac calculation.
2016-09-24sysctl: configure kernel parameters in the order they occur in each sysctl ↵HATAYAMA Daisuke
configuration files (#4205) Currently, systemd-sysctl command configures kernel parameters in each sysctl configuration files in random order due to characteristics of iterator of Hashmap. However, kernel parameters need to be configured in the order they occur in each sysctl configuration files. - For example, consider fs.suid_coredump and kernel.core_pattern. If fs.suid_coredump=2 is configured before kernel.core_pattern= whose default value is "core", then kernel outputs the following message: Unsafe core_pattern used with suid_dumpable=2. Pipe handler or fully qualified core dump path required. Note that the security issue mentioned in this message has already been fixed on recent kernels, so this is just a warning message on such kernels. But it's still confusing to users that this message is output on some boot and not output on another boot. - I don't know but there could be other kernel parameters that are significant in the order they are configured. - The legacy sysctl command configures kernel parameters in the order they occur in each sysctl configuration files. Although I didn't find any official specification explaining this behavior of sysctl command, I don't think there is any meaningful reason to change this behavior, in particular, to the random one. This commit does the change by simply using OrderedHashmap instead of Hashmap.
2016-09-24nspawn: decouple --boot from CLONE_NEWIPC (#4180)Luca Bruno
This commit is a minor tweak after the split of `--share-system`, decoupling the `--boot` option from IPC namespacing. Historically there has been a single `--share-system` option for sharing IPC/PID/UTS with the host, which was incompatible with boot/pid1 mode. After the split, it is now possible to express the requirements with better granularity. For reference, this is a followup to #4023 which contains references to previous discussions. I realized too late that CLONE_NEWIPC is not strictly needed for boot mode.
2016-09-23journal: fix HMAC calculation when appending a data objectFranck Bui
Since commit 5996c7c295e073ce21d41305169132c8aa993ad0 (v190 !), the calculation of the HMAC is broken because the hash for a data object including a field is done in the wrong order: the field object is hashed before the data object is. However during verification, the hash is done in the opposite order as objects are scanned sequentially.
2016-09-23journal: warn when we fail to append a tag to a journalFranck Bui
We shouldn't silently fail when appending the tag to a journal file since FSS protection will simply be disabled in this case.
2016-09-22machine: Disable more output when quiet flag is set (#4196)Wilhelm Schuster
2016-09-20nspawn: fix comment typo in setup_timezone example (#4183)Michael Pope
2016-09-19networkd: Allow specifying RouteTable for RAsJason Kölker
2016-09-19networkd: Allow specifying RouteTable for DHCPJason Kölker
2016-09-18journal: fix typo in comment (#4176)Felix Zhang
2016-09-17Revert "kernel-install: Add KERNEL_INSTALL_NOOP (#4103)"Martin Pitt
Further discussion showed that this better gets addressed at the packaging level. This reverts commit 34210af7c63640fca1fd4a09fc23b01a8cd70bf3.
2016-09-17Merge pull request #4123 from keszybz/network-file-dropinsMartin Pitt
Network file dropins
2016-09-17nspawn: clarify log warning for /etc/localtime not being a symbolic link (#4163)Michael Pope
2016-09-16networkd: change message about missing KindZbigniew Jędrzejewski-Szmek
If Kind is not specied, the message about "Invalid Kind" was misleading. If Kind was specified in an invalid way, we get a message in the parsing phase anyway. Reword the message to cover both cases better.
2016-09-16networkd: support drop-in dirs for .network filesZbigniew Jędrzejewski-Szmek
2016-09-16shared/conf-parser: add config_parse_many which takes strv with dirsZbigniew Jędrzejewski-Szmek
This way we don't have to create a nulstr just to unpack it in a moment.
2016-09-16tree-wide: rename config_parse_many to …_nulstrZbigniew Jędrzejewski-Szmek
In preparation for adding a version which takes a strv.
2016-09-16networkd: support drop-in directories for .network filesJean-Sébastien Bour
Fixes #3655. [zj: Fix the tests.]
2016-09-16Updated formatting for printing the key for FSS (#4165)hi117
The key used to be jammed next to the local file path. Based on the format string on line 1675, I determined that the order of arguments was written incorrectly, and updated the function based on that assumption. Before: ``` Please write down the following secret verification key. It should be stored at a safe location and should not be saved locally on disk. /var/log/journal/9b47c1a5b339412887a197b7654673a7/fss8f66d6-f0a998-f782d0-1fe522/18fdb8-35a4e900 The sealing key is automatically changed every 15min. ``` After: ``` Please write down the following secret verification key. It should be stored at a safe location and should not be saved locally on disk. d53ed4-cc43d6-284e10-8f0324/18fdb8-35a4e900 The sealing key is automatically changed every 15min. ```
2016-09-15Merge pull request #4131 from intelfx/update-done-timestamps-precisionZbigniew Jędrzejewski-Szmek
condition: ignore nanoseconds in timestamps for ConditionNeedsUpdate= Fixes #4130.
2016-09-16logind: fix /run/user/$UID creation in apparmor-confined containers (#4154)Tomáš Janoušek
When a docker container is confined with AppArmor [1] and happens to run on top of a kernel that supports mount mediation [2], e.g. any Ubuntu kernel, mount(2) returns EACCES instead of EPERM. This then leads to: systemd-logind[33]: Failed to mount per-user tmpfs directory /run/user/1000: Permission denied login[42]: pam_systemd(login:session): Failed to create session: Access denied and user sessions don't start. This also applies to selinux that too returns EACCES on mount denial. [1] https://github.com/docker/docker/blob/master/docs/security/apparmor.md#understand-the-policies [2] http://bazaar.launchpad.net/~apparmor-dev/apparmor/master/view/head:/kernel-patches/4.7/0025-UBUNTU-SAUCE-apparmor-Add-the-ability-to-mediate-mou.patch
2016-09-15test-execute: fix %n typo (#4153)Zbigniew Jędrzejewski-Szmek