summaryrefslogtreecommitdiff
path: root/src/nspawn
AgeCommit message (Collapse)Author
2017-06-16nspawn: mount_legacy_cgns_supported(): Rename variables to not lieLuke Shumaker
mount_legacy_cgns_supported() is very clearly meant to be a version of mount_legacy_cgns_unsupported() modified to cope with the fact that it has already chroot()ed, and thus can't look at the host /sys. So, the loops and such look similar. However, to cope with the fact that it can't look at /sys, it deals with hierarchies in the outermost loop, rather than controllers. Yet, it kept the list variable named "controllers". That's confusing.
2017-06-16nspawn: Merge chown_cgroup(), sync_cgroup(), and create_subcgroup() into one ↵Luke Shumaker
cgroup_setup()
2017-06-16nspawn: Detect the outer_cgver once, and pass that aroundLuke Shumaker
Yes, the relevant functions in cgroup-util actually do cache the values with static variables. But passing it around as a value makes the flow much nicer. The symmetry of having both the inner and outer cg versions as a CGroupUnified enum makes the code much easier to grok; this could be done with cg_version(), but I still think this is more readable.
2017-06-16nspawn: nspawn-cgroup.{c,h}: s/unified_requested/inner_cgver/Luke Shumaker
2017-06-16nspawn: Move cgroup mount stuff from nspawn-mount.c to nspawn-cgroup.cLuke Shumaker
2017-06-16nspawn: Parse UNIFIED_CGROUP_HIERARCHY similarly to any other argLuke Shumaker
2017-06-16nspawn: Clarify detect_unified_cgroup_hierarchy()Luke Shumaker
2017-06-16nspawn: Rename arg_uid_shift -> uid_shiftLuke Shumaker
Naming it arg_uid_shift is confusing because of the global arg_uid_shift in nspawn.c
2017-06-16nspawn: Improve --help textLuke Shumaker
The `--help` text lies about what the `-U` flag does, and under-documents the `--private-users` values. . Fix that.
2017-06-16nspawn: Simplify tmpfs_patch_options() usage, and trickle that upLuke Shumaker
One of the things that tmpfs_patch_options does is take an (optional) UID, and insert "uid=${UID},gid=${UID}" into the options string. So we need a uid_t argument, and a way of telling if we should use it. Fortunately, that is built in to the uid_t value by having UID_INVALID as a possible value. So this is really a feature that requires one argument. Yet, it is somehow taking 4! That is absurd. Simplify it to only take one argument, and have that trickle all the way up to mount_all()'s usage. Now, in may of the uses, the argument becomes uid_shift == 0 ? UID_INVALID : uid_shift because it used to treat uid_shift=0 as invalid unless the patch_ids flag was also set. This keeps the behavior the same. Note that in all cases where it is invoked, if !userns, then uid_shift is 0; we don't have to add any checks for that. That said, I'm pretty sure that "uid=0" and not setting "uid=" are the same, but Christian Brauner seemed to not think so when implementing the cgns support. https://github.com/systemd/systemd/pull/3589
2017-06-16nspawn: mount_sysfs(): Reword the comment about /sys/fs/cgroupLuke Shumaker
The comment explains the obvious, but doesn't even mention the tricky part. Of course we need do set things up before we remount read-only! That's the general theme of the function! What was totally non-obvious is why we only need to create it if cg_ns_supported(), as the directory needs to exist no matter what. From reading the code, I was convinced that it was broken on pre-cgns kernels (pre-4.6, unless a distro backported it). So explain that skippint creating if !cg_ns_supported() is an optimization.
2017-06-16nspawn: if !cg_ns_supported() then force arg_use_cgns = falseLuke Shumaker
It's silly that every time we check arg_use_cgns we also have to check cg_ns_supported(). So, simplify these checks and force arg_use_cgns = false if the kernel doesn't support cg_ns_supported.
2017-06-16nspawn: send_one_fd() uses the "return -errno" convention, not errnoLuke Shumaker
2017-06-16nspawn: fix clobbering of selinux context argZbigniew Jędrzejewski-Szmek
First bug fixed by gcc 7. Yikes. (cherry picked from commit 9ce6d1b319f8655100af6ecf5fd57e4558d57dd1)
2017-06-16tree-wide: adjust fall through comments so that gcc is happyZbigniew Jędrzejewski-Szmek
gcc 7 adds -Wimplicit-fallthrough=3 to -Wextra. There are a few ways we could deal with that. After we take into account the need to stay compatible with older versions of the compiler (and other compilers), I don't think adding __attribute__((fallthrough)), even as a macro, is worth the trouble. It sticks out too much, a comment is just as good. But gcc has some very specific requiremnts how the comment should look. Adjust it the specific form that it likes. I don't think the extra stuff we had in those comments was adding much value. (Note: the documentation seems to be wrong, and seems to describe a different pattern from the one that is actually used. I guess either the docs or the code will have to change before gcc 7 is finalized.) (cherry picked from commit ec251fe7d5bc24b5d38b0853bc5969f3a0ba06e2)
2017-06-16nspawn: add missing -E to getopt_long (#4860)Wim de With
(cherry picked from commit 2e1f244efd2dfc1a60d032bef3d88b9ba6e0444b)
2017-06-16nspawn: fix cgroup mode detectionTejun Heo
cgroup mode detection is broken in two different ways. * detect_unified_cgroup_hierarchy() is called too nested in outer_child(). sync_cgroup() which is used by run() also needs to know the requested cgroup mode but it's currently always getting CGROUP_UNIFIED_UNKNOWN. This makes it skip syncing the inner cgroup hierarchy on some config combinations. $ cat /proc/self/cgroup | grep systemd 1:name=systemd:/user.slice/user-0.slice/session-c1.scope $ UNIFIED_CGROUP_HIERARCHY=0 SYSTEMD_NSPAWN_USE_CGNS=0 systemd-nspawn -M container ... [root@container ~]# cat /proc/self/cgroup | grep systemd 1:name=systemd:/machine.slice/machine-container.x86_64.scope $ exit $ UNIFIED_CGROUP_HIERARCHY=1 SYSTEMD_NSPAWN_USE_CGNS=0 systemd-nspawn -M container [root@container ~]# cat /proc/self/cgroup | grep 0:: 0::/ $ exit Note how the unified hierarchy case's path is not synchronized with the host. This for example can cause issues when there are multiple such containers. Fixed by moving detect_unified_cgroup_hierarchy() invocation to main(). * inner_child() was invoking cg_unified_flush(). inner_child() executes fully scoped and can't determine which cgroup mode the host was in. It doesn't make sense to keep flushing the detected mode when the host mode can't change. Fixed by replacing cg_unified_flush() invocations in outer_child() and inner_child() with one in main(). (cherry picked from commit bd15ab41a1347fed8266845f875842d1502e02a6)
2017-05-06Merge tag 'systemd/v232-6.parabola1'systemd/v232-8.parabola2Luke Shumaker
2017-05-06build-sys: add check for gperf lookup function signature (#5055)Mike Gilbert
gperf-3.1 generates lookup functions that take a size_t length parameter instead of unsigned int. Test for this at configure time. Fixes: https://github.com/systemd/systemd/issues/5039
2016-12-17Merge tag 'systemd/v232-4'systemd/v232-6Luke Shumaker
2016-12-17nspawn: don't hide --bind=/tmp/* mountsDave Reisner
This is a v232-applicable version of upstream c9fd987279a462e.
2016-12-17nspawn: fix exit code for --help and --version (#4609)Martin Pitt
Commit b006762 inverted the initial exit code which is relevant for --help and --version without a particular reason. For these special options, parse_argv() returns 0 so that our main() immediately skips to the end without adjusting "ret". Otherwise, if an actual container is being started, ret is set on error in run(), which still provides the "non-zero exit on error" behaviour. Fixes #4605.
2016-12-17Revert "nspawn: try to bind mount resolved's resolv.conf snippet into the ↵systemd/v232-4Dave Reisner
container" This reverts commit 3539724c26a1b2b00c4eb3c004b635a4b8647de6.
2016-11-02nspawn: if we set up a loopback device, try to mount it with "discard"Lennart Poettering
Let's make sure that our loopback files remain sparse, hence let's set "discard" as mount option on file systems that support it if the backing device is a loopback.
2016-10-24seccomp: add new seccomp_init_conservative() helperLennart Poettering
This adds a new seccomp_init_conservative() helper call that is mostly just a wrapper around seccomp_init(), but turns off NNP and adds in all secondary archs, for best compatibility with everything else. Pretty much all of our code used the very same constructs for these three steps, hence unifying this in one small function makes things a lot shorter. This also changes incorrect usage of the "scmp_filter_ctx" type at various places. libseccomp defines it as typedef to "void*", i.e. it is a pointer type (pretty poor choice already!) that casts implicitly to and from all other pointer types (even poorer choice: you defined a confusing type now, and don't even gain any bit of type safety through it...). A lot of the code assumed the type would refer to a structure, and hence aded additional "*" here and there. Remove that.
2016-10-23nspawn: become a new root earlyEvgeny Vereshchagin
https://github.com/torvalds/linux/commit/036d523641c66bef713042894a17f4335f199e49 > vfs: Don't create inodes with a uid or gid unknown to the vfs It is expected that filesystems can not represent uids and gids from outside of their user namespace. Keep things simple by not even trying to create filesystem nodes with non-sense uids and gids. So, we actually should `reset_uid_gid` early to prevent https://github.com/systemd/systemd/pull/4223#issuecomment-252522955 $ sudo UNIFIED_CGROUP_HIERARCHY=no LD_LIBRARY_PATH=.libs .libs/systemd-nspawn -D /var/lib/machines/fedora-rawhide -U -b systemd.unit=multi-user.target Spawning container fedora-rawhide on /var/lib/machines/fedora-rawhide. Press ^] three times within 1s to kill container. Child died too early. Selected user namespace base 1073283072 and range 65536. Failed to mount to /sys/fs/cgroup/systemd: No such file or directory Details: https://github.com/systemd/systemd/pull/4223#issuecomment-253046519 Fixes: #4352
2016-10-23nspawn: really lchown(uid/gid)Evgeny Vereshchagin
https://github.com/systemd/systemd/pull/4372#issuecomment-253723849: * `mount_all (outer_child)` creates `container_dir/sys/fs/selinux` * `mount_all (outer_child)` doesn't patch `container_dir/sys/fs` and so on. * `mount_sysfs (inner_child)` tries to create `/sys/fs/cgroup` * This fails 370 stat("/sys/fs", {st_dev=makedev(0, 28), st_ino=13880, st_mode=S_IFDIR|0755, st_nlink=3, st_uid=65534, st_gid=65534, st_blksize=4096, st_blocks=0, st_size=60, st_atime=2016/10/14-05:16:43.398665943, st_mtime=2016/10/14-05:16:43.399665943, st_ctime=2016/10/14-05:16:43.399665943}) = 0 370 mkdir("/sys/fs/cgroup", 0755) = -1 EACCES (Permission denied) * `mount_syfs (inner_child)` ignores that error and mount(NULL, "/sys", NULL, MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0 * `mount_cgroups` finally fails
2016-10-21nspawn, NEWS: add missing "s" in --private-users-chown (#4438)Zbigniew Jędrzejewski-Szmek
2016-10-16tree-wide: use mfree moreZbigniew Jędrzejewski-Szmek
2016-10-14nspawn: remove unused variable (#4369)Thomas H. P. Andersen
2016-10-13nspawn: cleanup and chown the synced cgroup hierarchy (#4223)Evgeny Vereshchagin
Fixes: #4181
2016-10-12Merge pull request #4351 from keszybz/nspawn-debuggingLennart Poettering
Enhance nspawn debug logs for mount/unmount operations
2016-10-11nspawn: let's mount(/tmp) inside the user namespace (#4340)Evgeny Vereshchagin
Fixes: host# systemd-nspawn -D ... -U -b systemd.unit=multi-user.target ... $ grep /tmp /proc/self/mountinfo 154 145 0:41 / /tmp rw - tmpfs tmpfs rw,seclabel,uid=1036124160,gid=1036124160 $ umount /tmp umount: /root/tmp: not mounted $ systemctl poweroff ... [FAILED] Failed unmounting Temporary Directory.
2016-10-11nspawn,mount-util: add [u]mount_verbose and use it in nspawnZbigniew Jędrzejewski-Szmek
This makes it easier to debug failed nspawn invocations: Mounting sysfs on /var/lib/machines/fedora-rawhide/sys (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV "")... Mounting tmpfs on /var/lib/machines/fedora-rawhide/dev (MS_NOSUID|MS_STRICTATIME "mode=755,uid=1450901504,gid=1450901504")... Mounting tmpfs on /var/lib/machines/fedora-rawhide/dev/shm (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=1777,uid=1450901504,gid=1450901504")... Mounting tmpfs on /var/lib/machines/fedora-rawhide/run (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=755,uid=1450901504,gid=1450901504")... Bind-mounting /sys/fs/selinux on /var/lib/machines/fedora-rawhide/sys/fs/selinux (MS_BIND "")... Remounting /var/lib/machines/fedora-rawhide/sys/fs/selinux (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_BIND|MS_REMOUNT "")... Mounting proc on /proc (MS_NOSUID|MS_NOEXEC|MS_NODEV "")... Bind-mounting /proc/sys on /proc/sys (MS_BIND "")... Remounting /proc/sys (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_BIND|MS_REMOUNT "")... Bind-mounting /proc/sysrq-trigger on /proc/sysrq-trigger (MS_BIND "")... Remounting /proc/sysrq-trigger (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_BIND|MS_REMOUNT "")... Mounting tmpfs on /tmp (MS_STRICTATIME "mode=1777,uid=0,gid=0")... Mounting tmpfs on /sys/fs/cgroup (MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_STRICTATIME "mode=755,uid=0,gid=0")... Mounting cgroup on /sys/fs/cgroup/systemd (MS_NOSUID|MS_NOEXEC|MS_NODEV "none,name=systemd,xattr")... Failed to mount cgroup on /sys/fs/cgroup/systemd (MS_NOSUID|MS_NOEXEC|MS_NODEV "none,name=systemd,xattr"): No such file or directory
2016-10-11nspawn: small cleanups in get_controllers()Zbigniew Jędrzejewski-Szmek
- check for oom after strdup - no need to truncate the line since we're only extracting one field anyway - use STR_IN_SET
2016-10-11nspawn: simplify arg_us_cgns passingZbigniew Jędrzejewski-Szmek
We would check the condition cg_ns_supported() twice. No functional change.
2016-10-10Merge pull request #4332 from keszybz/nspawn-arguments-3Lennart Poettering
nspawn --private-users parsing, v2
2016-10-10Merge pull request #4310 from keszybz/nspawn-autodetectEvgeny Vereshchagin
Autodetect systemd version in containers started by systemd-nspawn
2016-10-10nspawn: better error messages for parsing errorsZbigniew Jędrzejewski-Szmek
In particular, the check for arg_uid_range <= 0 is moved to the end, so that "foobar:0" gives "Failed to parse UID", and not "UID range cannot be 0.".
2016-10-10nspawn,man: fix parsing of numeric args for --private-users, accept any booleanZbigniew Jędrzejewski-Szmek
This is like the previous reverted commit, but any boolean is still accepted, not just "yes" and "no". Man page is adjusted to match the code.
2016-10-10Revert "nspawn: fix parsing of numeric arguments for --private-users"Zbigniew Jędrzejewski-Szmek
This reverts commit bfd292ec35c7b768f9fb5cff4d921f3133e62b19.
2016-10-09nspawn: fix parsing of numeric arguments for --private-usersZbigniew Jędrzejewski-Szmek
The documentation says lists "yes", "no", "pick", and numeric arguments. But parse_boolean was attempted first, so various numeric arguments were misinterpreted. In particular, this fixes --private-users=0 to mean the same thing as --private-users=0:65536. While at it, use strndupa to avoid some error handling. Also give a better error for an empty UID range. I think it's likely that people will use --private-users=0:0 thinking that the argument means UID:GID.
2016-10-09nspawn: reindent tableZbigniew Jędrzejewski-Szmek
2016-10-08nspawn: also fall back to legacy cgroup hierarchy for old containersZbigniew Jędrzejewski-Szmek
Current systemd version detection routine cannot detect systemd 230, only systmed >= 231. This means that we'll still use the legacy hierarchy in some cases where we wouldn't have too. If somebody figures out a nice way to detect systemd 230 this can be later improved.
2016-10-08nspawn: use mixed cgroup hierarchy only when container has new systemdZbigniew Jędrzejewski-Szmek
systemd-soon-to-be-released-232 is able to deal with the mixed hierarchy. So make an educated guess, and use the mixed hierarchy in that case. Tested by running the host with mixed hierarchy (i.e. simply using a recent kernel with systemd from git), and booting first a container with older systemd, and then one with a newer systemd. Fixes #4008.
2016-10-08nspawn: fix spurious reboot if container process returns 133Zbigniew Jędrzejewski-Szmek
2016-10-08nspawn: move the main loop body out to a new functionZbigniew Jędrzejewski-Szmek
The new function has 416 lines by itself! "return log_error_errno" is used to nicely reduce the volume of error handling code. A few minor issues are fixed on the way: - positive value was used as error value (EIO), causing systemd-nspawn to return success, even though it shouldn't. - In two places random values were used as error status, when the actual value was in an unusual place (etc_password_lock, notify_socket). Those are the only functional changes. There is another potential issue, which is marked with a comment, and left unresolved: the container can also return 133 by itself, causing a spurious reboot.
2016-10-08nspawn: check env var first, detect secondZbigniew Jędrzejewski-Szmek
If we are going to use the env var to override the detection result anyway, there is not point in doing the detection, especially that it can fail.
2016-10-06tree-wide: drop some misleading compiler warningsLennart Poettering
gcc at some optimization levels thinks thes variables were used without initialization. it's wrong, but let's make the message go anyway.
2016-10-05nspawn: add log message to let users know that nspawn needs an empty /dev ↵Djalal Harouni
directory (#4226) Fixes https://github.com/systemd/systemd/issues/3695 At the same time it adds a protection against userns chown of inodes of a shared mount point.