summaryrefslogtreecommitdiff
path: root/src/core/namespace.c
AgeCommit message (Collapse)Author
2017-05-20./tools/notsd-moveLuke Shumaker
2016-10-12core:sandbox: lets make /lib/modules/ inaccessible on ProtectKernelModules=Djalal Harouni
Lets go further and make /lib/modules/ inaccessible for services that do not have business with modules, this is a minor improvment but it may help on setups with custom modules and they are limited... in regard of kernel auto-load feature. This change introduce NameSpaceInfo struct which we may embed later inside ExecContext but for now lets just reduce the argument number to setup_namespace() and merge ProtectKernelModules feature.
2016-09-25core:namespace: simplify ProtectHome= implementationDjalal Harouni
As with previous patch simplify ProtectHome and don't care about duplicates, they will be sorted by most restrictive mode and cleaned.
2016-09-25core: simplify ProtectSystem= implementationDjalal Harouni
ProtectSystem= with all its different modes and other options like PrivateDevices= + ProtectKernelTunables= + ProtectHome= are orthogonal, however currently it's a bit hard to parse that from the implementation view. Simplify it by giving each mode its own table with all paths and references to other Protect options. With this change some entries are duplicated, but we do not care since duplicate mounts are first sorted by the most restrictive mode then cleaned.
2016-09-25core:sandbox: add more /proc/* entries to ProtectKernelTunables=Djalal Harouni
Make ALSA entries, latency interface, mtrr, apm/acpi, suspend interface, filesystems configuration and IRQ tuning readonly. Most of these interfaces now days should be in /sys but they are still available through /proc, so just protect them. This patch does not touch /proc/net/...
2016-09-25core:namespace: simplify mount calculationDjalal Harouni
Move out mount calculation on its own function. Actually the logic is smart enough to later drop nop and duplicates mounts, this change improves code readability. --- src/core/namespace.c | 47 ++++++++++++++++++++++++++++++++++++----------- 1 file changed, 36 insertions(+), 11 deletions(-)
2016-09-25core:namespace: put paths protected by ProtectKernelTunables= inDjalal Harouni
Instead of having all these paths everywhere, put the ones that are protected by ProtectKernelTunables= into their own table. This way it is easy to add paths and track which ones are protected.
2016-09-25core:namespace: minor improvements to append_mounts()Djalal Harouni
2016-09-25namespace: drop all mounts outside of the new root directoryLennart Poettering
There's no point in mounting these, if they are outside of the root directory we'll move to.
2016-09-25namespace: don't make the root directory of a namespace a mount if it ↵Lennart Poettering
already is one Let's not stack mounts needlessly.
2016-09-25namespace: chase symlinks for mounts to set up in userspaceLennart Poettering
This adds logic to chase symlinks for all mount points that shall be created in a namespace environment in userspace, instead of leaving this to the kernel. This has the advantage that we can correctly handle absolute symlinks that shall be taken relative to a specific root directory. Moreover, we can properly handle mounts created on symlinked files or directories as we can merge their mounts as necessary. (This also drops the "done" flag in the namespace logic, which was never actually working, but was supposed to permit a partial rollback of the namespace logic, which however is only mildly useful as it wasn't clear in which case it would or would not be able to roll back.) Fixes: #3867
2016-09-25namespace: invoke unshare() only after checking all parametersLennart Poettering
Let's create the new namespace only after we validated and processed all parameters, right before we start with actually mounting things. This way, the window where we can roll back is larger (not that it matters IRL...)
2016-09-25core: introduce ProtectSystem=strictLennart Poettering
Let's tighten our sandbox a bit more: with this change ProtectSystem= gains a new setting "strict". If set, the entire directory tree of the system is mounted read-only, but the API file systems /proc, /dev, /sys are excluded (they may be managed with PrivateDevices= and ProtectKernelTunables=). Also, /home and /root are excluded as those are left for ProtectHome= to manage. In this mode, all "real" file systems (i.e. non-API file systems) are mounted read-only, and specific directories may only be excluded via ReadWriteDirectories=, thus implementing an effective whitelist instead of blacklist of writable directories. While we are at, also add /efi to the list of paths always affected by ProtectSystem=. This is a follow-up for b52a109ad38cd37b660ccd5394ff5c171a5e5355 which added /efi as alternative for /boot. Our namespacing logic should respect that too.
2016-09-25namespace: add some debug logging when enforcing InaccessiblePaths=Lennart Poettering
2016-09-25namespace: rework how ReadWritePaths= is appliedLennart Poettering
Previously, if ReadWritePaths= was nested inside a ReadOnlyPaths= specification, then we'd first recursively apply the ReadOnlyPaths= paths, and make everything below read-only, only in order to then flip the read-only bit again for the subdirs listed in ReadWritePaths= below it. This is not only ugly (as for the dirs in question we first turn on the RO bit, only to turn it off again immediately after), but also problematic in containers, where a container manager might have marked a set of dirs read-only and this code will undo this is ReadWritePaths= is set for any. With this patch behaviour in this regard is altered: ReadOnlyPaths= will not be applied to the children listed in ReadWritePaths= in the first place, so that we do not need to turn off the RO bit for those after all. This means that ReadWritePaths=/ReadOnlyPaths= may only be used to turn on the RO bit, but never to turn it off again. Or to say this differently: if some dirs are marked read-only via some external tool, then ReadWritePaths= will not undo it. This is not only the safer option, but also more in-line with what the man page currently claims: "Entries (files or directories) listed in ReadWritePaths= are accessible from within the namespace with the same access rights as from outside." To implement this change bind_remount_recursive() gained a new "blacklist" string list parameter, which when passed may contain subdirs that shall be excluded from the read-only mounting. A number of functions are updated to add more debug logging to make this more digestable.
2016-09-25namespace: when enforcing fs namespace restrictions suppress redundant mountsLennart Poettering
If /foo is marked to be read-only, and /foo/bar too, then the latter may be suppressed as it has no effect.
2016-09-25namespace: simplify mount_path_compare() a bitLennart Poettering
2016-09-25namespace: make sure InaccessibleDirectories= masks all mounts further downLennart Poettering
If a dir is marked to be inaccessible then everything below it should be masked by it.
2016-09-25core: add two new service settings ProtectKernelTunables= and ↵Lennart Poettering
ProtectControlGroups= If enabled, these will block write access to /sys, /proc/sys and /proc/sys/fs/cgroup.
2016-07-22Merge pull request #3764 from poettering/assorted-stuff-2Martin Pitt
Assorted fixes
2016-07-20namespace: fix wrong return value from mount(2) (#3758)Topi Miettinen
Fix bug introduced by #3263: mount(2) return value is 0 or -1, not errno. Thanks to Evgeny Vereshchagin (@evverx) for reporting.
2016-07-20namespace: add a (void) castLennart Poettering
2016-07-20namespace: minor improvementsLennart Poettering
We generally try to avoid strerror(), due to its threads-unsafety, let's do this here, too. Also, let's be tiny bit more explanatory with the log messages, and let's shorten a few things.
2016-07-19doc,core: Read{Write,Only}Paths= and InaccessiblePaths=Alessandro Puccetti
This patch renames Read{Write,Only}Directories= and InaccessibleDirectories= to Read{Write,Only}Paths= and InaccessiblePaths=, previous names are kept as aliases but they are not advertised in the documentation. Renamed variables: `read_write_dirs` --> `read_write_paths` `read_only_dirs` --> `read_only_paths` `inaccessible_dirs` --> `inaccessible_paths`
2016-07-19namespace: unify limit behavior on non-directory pathsAlessandro Puccetti
Despite the name, `Read{Write,Only}Directories=` already allows for regular file paths to be masked. This commit adds the same behavior to `InaccessibleDirectories=` and makes it explicit in the doc. This patch introduces `/run/systemd/inaccessible/{reg,dir,chr,blk,fifo,sock}` {dile,device}nodes and mounts on the appropriate one the paths specified in `InacessibleDirectories=`. Based on Luca's patch from https://github.com/systemd/systemd/pull/3327
2016-05-15namespace: Make private /dev noexec and readonly (#3263)topimiettinen
Private /dev will not be managed by udev or others, so we can make it noexec and readonly after we have made all device nodes. As /dev/shm needs to be writable, we can't use bind_remount_recursive().
2016-05-14namespace: unmount old /dev under our new private /dev (#3254)topimiettinen
Drop all dangling old /dev mounts before mounting a new private /dev tree.
2016-02-11Remove kdbus custom endpoint supportDaniel Mack
This feature will not be used anytime soon, so remove a bit of cruft. The BusPolicy= config directive will stay around as compat noop.
2016-02-10tree-wide: remove Emacs lines from all filesDaniel Mack
This should be handled fine now by .dir-locals.el, so need to carry that stuff in every file.
2015-10-27util-lib: split out allocation calls into alloc-util.[ch]Lennart Poettering
2015-10-27user-util: move UID/GID related macros from macro.h to user-util.hLennart Poettering
2015-10-27util-lib: split out umask-related code to umask-util.hLennart Poettering
2015-10-27util-lib: move string table stuff into its own string-table.[ch]Lennart Poettering
2015-10-27util-lib: move mount related utility calls to mount-util.[ch]Lennart Poettering
2015-10-26socket-util: move remaining socket-related calls from util.[ch] to ↵Lennart Poettering
socket-util.[ch]
2015-10-25util-lib: split out fd-related operations into fd-util.[ch]Lennart Poettering
There are more than enough to deserve their own .c file, hence move them over.
2015-10-24util-lib: split our string related calls from util.[ch] into its own file ↵Lennart Poettering
string-util.[ch] There are more than enough calls doing string manipulations to deserve its own files, hence do something about it. This patch also sorts the #include blocks of all files that needed to be updated, according to the sorting suggestions from CODING_STYLE. Since pretty much every file needs our string manipulation functions this effectively means that most files have sorted #include blocks now. Also touches a few unrelated include files.
2015-09-29tree-wide: port more code to use send_one_fd() and receive_one_fd()Lennart Poettering
Also, make it slightly more powerful, by accepting a flags argument, and make it safe for handling if more than one cmsg attribute happens to be attached.
2015-09-09tree-wide: update empty-if coccinelle script to cover empty-while and moreLennart Poettering
Let's also clean up single-line while and for blocks.
2015-09-09tree-wide: make use of log_error_errno() return value in more casesLennart Poettering
The previous coccinelle semantic patch that improved usage of log_error_errno()'s return value, only looked for log_error_errno() invocations with a single parameter after the error parameter. Update the patch to handle arbitrary numbers of additional arguments.
2015-09-09tree-wide: make use of log_error_errno() return valueLennart Poettering
Turns this: r = -errno; log_error_errno(errno, "foo"); into this: r = log_error_errno(errno, "foo"); and this: r = log_error_errno(errno, "foo"); return r; into this: return log_error_errno(errno, "foo");
2015-06-10util: introduce CMSG_FOREACH() macro and make use of it everywhereLennart Poettering
It's only marginally shorter then the usual for() loop, but certainly more readable.
2015-05-31core/namespace: Protect /usr instead of /home with ProtectSystem=yesJason Pleau
A small typo in ee818b8 caused /home to be put in read-only instead of /usr when ProtectSystem was enabled (ie: not set to "no").
2015-05-21nspawn: finish user namespace supportLennart Poettering
2015-05-20core,nspawn: unify code that moves the root dirLennart Poettering
2015-05-18core: Private*/Protect* options with RootDirectoryAlban Crequy
When a service is chrooted with the option RootDirectory=/opt/..., then the options PrivateDevices, PrivateTmp, ProtectHome, ProtectSystem must mount the directories under $RootDirectory/{dev,tmp,home,usr,boot}. The test-ns tool can test setup_namespace() with and without chroot: $ sudo TEST_NS_PROJECTS=/home/lennart/projects ./test-ns $ sudo TEST_NS_CHROOT=/home/alban/debian-tree TEST_NS_PROJECTS=/home/alban/debian-tree/home/alban/Documents ./test-ns
2015-05-13nspawn: rework custom mount point order, and add support for overlayfsLennart Poettering
Previously all bind mount mounts were applied in the order specified, followed by all tmpfs mounts in the order specified. This is problematic, if bind mounts shall be placed within tmpfs mounts. This patch hence reworks the custom mount point logic, and alwas applies them in strict prefix-first order. This means the order of mounts specified on the command line becomes irrelevant, the right operation will always be executed. While we are at it this commit also adds native support for overlayfs mounts, as supported by recent kernels.
2015-03-31nspawn: change filesystem type from "bind" to NULL in mount() syscallsIago López Galeiras
Try to keep syscalls as minimal as possible.
2015-03-16core/namespace: fix path sortingMichal Schmidt
The comparison function we use for qsorting paths is overly indifferent. Consider these 3 paths for sorting: /foo /bar /foo/foo qsort() may compare: "/foo" with "/bar" => 0, indifference "/bar" with "/foo/foo" => 0, indifference and assume transitively that "/foo" and "/foo/foo" are also indifferent. But this is wrong, we want "/foo" sorted before "/foo/foo". The comparison function must be transitive. Use path_compare(), which behaves properly. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1184016
2015-03-13core: explicitly ignore failure during cleanupZbigniew Jędrzejewski-Szmek
CID #1237550.