summaryrefslogtreecommitdiff
path: root/src/nspawn
AgeCommit message (Collapse)Author
2016-12-01nspawn: add ability to configure overlay mounts to .nspawn filesLennart Poettering
Fixes: #4634
2016-12-01nspawn: split out overlayfs argument parsing into a function of its ownLennart Poettering
Add overlay_mount_parse() similar in style to tmpfs_mount_parse() and bind_mount_parse().
2016-12-01nspawn: use -ENOMEM instead of log_oom() in one caseLennart Poettering
The function is of the "library" kind and doesn't log ENOMEM in all other cases, hence fix the one outlier.
2016-12-01nspawn: make use of CHASE_NON_EXISTING when locking imageLennart Poettering
If --template= is used on an image, then the image might not exist initially. We can use CHASE_NON_EXISTING to properly lock the image already before it exists. Let's do so.
2016-12-01nspawn: use the new CHASE_NON_EXISTING flag when resolving mount pointsLennart Poettering
This restores the ability to implicitly create files/directories to mount specified mount points on.
2016-12-01fs-util: add flags parameter to chase_symlinks()Lennart Poettering
Let's remove chase_symlinks_prefix() and instead introduce a flags parameter to chase_symlinks(), with a flag CHASE_PREFIX_ROOT that exposes the behaviour of chase_symlinks_prefix().
2016-12-01nspawn: use chase_symlinks() on all paths specified via --tmpfs=, --bind= ↵Lennart Poettering
and so on Fixes: #2860
2016-12-01nspawn: coding style: don't mix variable declarations and function callsLennart Poettering
2016-12-01nspawn: use realloc_multiply() where it makes senseLennart Poettering
2016-12-01nspawn: accept --ephemeral --template= as alternative for --ephemeral ↵Lennart Poettering
--directory= As suggested in PR #3667. This PR simply ensures that --template= can be used as alternative to --directory= when --ephemeral is used, following the logic that for ephemeral options the source directory is actually a template. This does not deprecate usage of --directory= with --ephemeral, as I am not convinced the old logic wouldn't make sense. Fixes: #3667
2016-12-01nspawn: properly handle image/directory paths that are symlinksLennart Poettering
This resolves any paths specified on --directory=, --template=, and --image= before using them. This makes sure nspawn can be used correctly on symlinked images and directory trees. Fixes: #2001
2016-12-01tree-wide: stop using canonicalize_file_name(), use chase_symlinks() insteadLennart Poettering
Let's use chase_symlinks() everywhere, and stop using GNU canonicalize_file_name() everywhere. For most cases this should not change behaviour, however increase exposure of our function to get better tested. Most importantly in a few cases (most notably nspawn) it can take the correct root directory into account when chasing symlinks.
2016-11-22nspawn: don't require chown() if userns is not onLennart Poettering
Fixes: #4711
2016-11-22nspawn: add fallback top normal copy/reflink when we cannot btrfs snapshotLennart Poettering
Given that other file systems (notably: xfs) support reflinks these days, let's extend the file system snapshotting logic to fall back to plan copies or reflinks when full btrfs subvolume snapshots are not available. This essentially makes "systemd-nspawn --ephemeral" and "systemd-nspawn --template=" available on non-btrfs subvolumes. Of course, both operations will still be slower on non-btrfs than on btrfs (simply because reflinking each file individually in a directory tree is still slower than doing this in one step for a whole subvolume), but it's probably good enough for many cases, and we should provide the users with the tools, they have to figure out what's good for them. Note that "machinectl clone" already had a fallback like this in place, this patch generalizes this, and adds similar support to our other cases.
2016-11-22nspawn: remove temporary root directory on exitLennart Poettering
When mountint a loopback image, we need a temporary root directory we can mount stuff to. Make sure to actually remove it when exiting, so that we don't leave stuff around in /tmp unnecessarily. See: #4664
2016-11-22nspawn: try to wait for the container PID 1 to exit, before we exitLennart Poettering
Let's make the shutdown logic synchronous, so that there's a better chance to detach the loopback device after use.
2016-11-22nspawn: support ephemeral boots from imagesLennart Poettering
Previously --ephemeral was only supported with container trees in btrfs subvolumes (i.e. in combination with --directory=). This adds support for --ephemeral in conjunction with disk images (i.e. --image=) too. As side effect this fixes that --ephemeral was accepted but ignored when using -M on a container that turned out to be an image. Fixes: #4664
2016-11-18Merge pull request #4395 from s-urbaniak/rw-supportLennart Poettering
nspawn: R/W support for /sysfs, /proc, and /proc/sys/net
2016-11-18nspawn: R/W support for /sys, and /proc/sysSergiusz Urbaniak
This commit adds the possibility to leave /sys, and /proc/sys read-write. It introduces a new (undocumented) env var SYSTEMD_NSPAWN_API_VFS_WRITABLE to enable this feature. If set to "yes", /sys, and /proc/sys will be read-write. If set to "no", /sys, and /proc/sys will be read-only. If set to "network" /proc/sys/net will be read-write. This is useful in use-cases, where systemd-nspawn is used in an external network namespace. This adds the possibility to start privileged containers which need more control over settings in the /proc, and /sys filesystem. This is also a follow-up on the discussion from https://github.com/systemd/systemd/pull/4018#r76971862 where an introduction of a simple env var to enable R/W support for those directories was already discussed.
2016-11-14nspawn: restart the whole systemd-nspawn@.service unit on container reboot ↵Zbigniew Jędrzejewski-Szmek
(#4613) Since 133 is now used in a few places, add a #define for it. Also make the status message a bit informative. Another issue introduced in b006762. The logic was borked, we were supposed to return 0 to break the loop, and 133 to restart the container, not the other way around. But this doesn't seem to work, reboot fails with: Nov 08 00:41:32 laptop systemd-nspawn[26564]: Failed to register machine: Machine 'fedora-rawhide' already exists So actually the version before this patch worked better, since 133 > 0 and we'd at least loop internally.
2016-11-08nspawn: fix condition for mounting resolv.conf (#4622)Christian Hesse
The file /usr/lib/systemd/resolv.conf can be stale, it does not tell us whether or not systemd-resolved is running or not. So check for /run/systemd/resolve/resolv.conf as well, which is created at runtime and hence is a better indication.
2016-11-08Merge pull request #4612 from keszybz/format-stringsZbigniew Jędrzejewski-Szmek
Format string tweaks (and a small fix on 32bit)
2016-11-07nspawn: fix exit code for --help and --version (#4609)Martin Pitt
Commit b006762 inverted the initial exit code which is relevant for --help and --version without a particular reason. For these special options, parse_argv() returns 0 so that our main() immediately skips to the end without adjusting "ret". Otherwise, if an actual container is being started, ret is set on error in run(), which still provides the "non-zero exit on error" behaviour. Fixes #4605.
2016-11-07Rename formats-util.h to format-util.hZbigniew Jędrzejewski-Szmek
We don't have plural in the name of any other -util files and this inconsistency trips me up every time I try to type this file name from memory. "formats-util" is even hard to pronounce.
2016-11-07systemd-nspawn: decrease non-fatal mount errors to debug level (#4569)tblume
non-fatal mount errors shouldn't be logged as warnings.
2016-11-03Merge pull request #4510 from keszybz/tree-wide-cleanupsLennart Poettering
Tree wide cleanups
2016-11-02nspawn: if we set up a loopback device, try to mount it with "discard"Lennart Poettering
Let's make sure that our loopback files remain sparse, hence let's set "discard" as mount option on file systems that support it if the backing device is a loopback.
2016-10-24seccomp: add new seccomp_init_conservative() helperLennart Poettering
This adds a new seccomp_init_conservative() helper call that is mostly just a wrapper around seccomp_init(), but turns off NNP and adds in all secondary archs, for best compatibility with everything else. Pretty much all of our code used the very same constructs for these three steps, hence unifying this in one small function makes things a lot shorter. This also changes incorrect usage of the "scmp_filter_ctx" type at various places. libseccomp defines it as typedef to "void*", i.e. it is a pointer type (pretty poor choice already!) that casts implicitly to and from all other pointer types (even poorer choice: you defined a confusing type now, and don't even gain any bit of type safety through it...). A lot of the code assumed the type would refer to a structure, and hence aded additional "*" here and there. Remove that.
2016-10-23nspawn: become a new root earlyEvgeny Vereshchagin
https://github.com/torvalds/linux/commit/036d523641c66bef713042894a17f4335f199e49 > vfs: Don't create inodes with a uid or gid unknown to the vfs It is expected that filesystems can not represent uids and gids from outside of their user namespace. Keep things simple by not even trying to create filesystem nodes with non-sense uids and gids. So, we actually should `reset_uid_gid` early to prevent https://github.com/systemd/systemd/pull/4223#issuecomment-252522955 $ sudo UNIFIED_CGROUP_HIERARCHY=no LD_LIBRARY_PATH=.libs .libs/systemd-nspawn -D /var/lib/machines/fedora-rawhide -U -b systemd.unit=multi-user.target Spawning container fedora-rawhide on /var/lib/machines/fedora-rawhide. Press ^] three times within 1s to kill container. Child died too early. Selected user namespace base 1073283072 and range 65536. Failed to mount to /sys/fs/cgroup/systemd: No such file or directory Details: https://github.com/systemd/systemd/pull/4223#issuecomment-253046519 Fixes: #4352
2016-10-23nspawn: really lchown(uid/gid)Evgeny Vereshchagin
https://github.com/systemd/systemd/pull/4372#issuecomment-253723849: * `mount_all (outer_child)` creates `container_dir/sys/fs/selinux` * `mount_all (outer_child)` doesn't patch `container_dir/sys/fs` and so on. * `mount_sysfs (inner_child)` tries to create `/sys/fs/cgroup` * This fails 370 stat("/sys/fs", {st_dev=makedev(0, 28), st_ino=13880, st_mode=S_IFDIR|0755, st_nlink=3, st_uid=65534, st_gid=65534, st_blksize=4096, st_blocks=0, st_size=60, st_atime=2016/10/14-05:16:43.398665943, st_mtime=2016/10/14-05:16:43.399665943, st_ctime=2016/10/14-05:16:43.399665943}) = 0 370 mkdir("/sys/fs/cgroup", 0755) = -1 EACCES (Permission denied) * `mount_syfs (inner_child)` ignores that error and mount(NULL, "/sys", NULL, MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0 * `mount_cgroups` finally fails
2016-10-23nspawn: use the return value from asprintf instead of checking the pointerZbigniew Jędrzejewski-Szmek
If allocation fails, the value of the point is "undefined". In practice this matters very little, but for consistency with rest of the code, let's check the return value.
2016-10-23tree-wide: drop NULL sentinel from strjoinZbigniew Jędrzejewski-Szmek
This makes strjoin and strjoina more similar and avoids the useless final argument. spatch -I . -I ./src -I ./src/basic -I ./src/basic -I ./src/shared -I ./src/shared -I ./src/network -I ./src/locale -I ./src/login -I ./src/journal -I ./src/journal -I ./src/timedate -I ./src/timesync -I ./src/nspawn -I ./src/resolve -I ./src/resolve -I ./src/systemd -I ./src/core -I ./src/core -I ./src/libudev -I ./src/udev -I ./src/udev/net -I ./src/udev -I ./src/libsystemd/sd-bus -I ./src/libsystemd/sd-event -I ./src/libsystemd/sd-login -I ./src/libsystemd/sd-netlink -I ./src/libsystemd/sd-network -I ./src/libsystemd/sd-hwdb -I ./src/libsystemd/sd-device -I ./src/libsystemd/sd-id128 -I ./src/libsystemd-network --sp-file coccinelle/strjoin.cocci --in-place $(git ls-files src/*.c) git grep -e '\bstrjoin\b.*NULL' -l|xargs sed -i -r 's/strjoin\((.*), NULL\)/strjoin(\1)/' This might have missed a few cases (spatch has a really hard time dealing with _cleanup_ macros), but that's no big issue, they can always be fixed later.
2016-10-21nspawn, NEWS: add missing "s" in --private-users-chown (#4438)Zbigniew Jędrzejewski-Szmek
2016-10-16tree-wide: use mfree moreZbigniew Jędrzejewski-Szmek
2016-10-14nspawn: remove unused variable (#4369)Thomas H. P. Andersen
2016-10-13nspawn: cleanup and chown the synced cgroup hierarchy (#4223)Evgeny Vereshchagin
Fixes: #4181
2016-10-12Merge pull request #4351 from keszybz/nspawn-debuggingLennart Poettering
Enhance nspawn debug logs for mount/unmount operations
2016-10-11nspawn: let's mount(/tmp) inside the user namespace (#4340)Evgeny Vereshchagin
Fixes: host# systemd-nspawn -D ... -U -b systemd.unit=multi-user.target ... $ grep /tmp /proc/self/mountinfo 154 145 0:41 / /tmp rw - tmpfs tmpfs rw,seclabel,uid=1036124160,gid=1036124160 $ umount /tmp umount: /root/tmp: not mounted $ systemctl poweroff ... [FAILED] Failed unmounting Temporary Directory.
2016-10-11nspawn,mount-util: add [u]mount_verbose and use it in nspawnZbigniew Jędrzejewski-Szmek
This makes it easier to debug failed nspawn invocations: Mounting sysfs on /var/lib/machines/fedora-rawhide/sys (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV "")... Mounting tmpfs on /var/lib/machines/fedora-rawhide/dev (MS_NOSUID|MS_STRICTATIME "mode=755,uid=1450901504,gid=1450901504")... Mounting tmpfs on /var/lib/machines/fedora-rawhide/dev/shm (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=1777,uid=1450901504,gid=1450901504")... Mounting tmpfs on /var/lib/machines/fedora-rawhide/run (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=755,uid=1450901504,gid=1450901504")... Bind-mounting /sys/fs/selinux on /var/lib/machines/fedora-rawhide/sys/fs/selinux (MS_BIND "")... Remounting /var/lib/machines/fedora-rawhide/sys/fs/selinux (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_BIND|MS_REMOUNT "")... Mounting proc on /proc (MS_NOSUID|MS_NOEXEC|MS_NODEV "")... Bind-mounting /proc/sys on /proc/sys (MS_BIND "")... Remounting /proc/sys (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_BIND|MS_REMOUNT "")... Bind-mounting /proc/sysrq-trigger on /proc/sysrq-trigger (MS_BIND "")... Remounting /proc/sysrq-trigger (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_BIND|MS_REMOUNT "")... Mounting tmpfs on /tmp (MS_STRICTATIME "mode=1777,uid=0,gid=0")... Mounting tmpfs on /sys/fs/cgroup (MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_STRICTATIME "mode=755,uid=0,gid=0")... Mounting cgroup on /sys/fs/cgroup/systemd (MS_NOSUID|MS_NOEXEC|MS_NODEV "none,name=systemd,xattr")... Failed to mount cgroup on /sys/fs/cgroup/systemd (MS_NOSUID|MS_NOEXEC|MS_NODEV "none,name=systemd,xattr"): No such file or directory
2016-10-11nspawn: small cleanups in get_controllers()Zbigniew Jędrzejewski-Szmek
- check for oom after strdup - no need to truncate the line since we're only extracting one field anyway - use STR_IN_SET
2016-10-11nspawn: simplify arg_us_cgns passingZbigniew Jędrzejewski-Szmek
We would check the condition cg_ns_supported() twice. No functional change.
2016-10-10Merge pull request #4332 from keszybz/nspawn-arguments-3Lennart Poettering
nspawn --private-users parsing, v2
2016-10-10Merge pull request #4310 from keszybz/nspawn-autodetectEvgeny Vereshchagin
Autodetect systemd version in containers started by systemd-nspawn
2016-10-10nspawn: better error messages for parsing errorsZbigniew Jędrzejewski-Szmek
In particular, the check for arg_uid_range <= 0 is moved to the end, so that "foobar:0" gives "Failed to parse UID", and not "UID range cannot be 0.".
2016-10-10nspawn,man: fix parsing of numeric args for --private-users, accept any booleanZbigniew Jędrzejewski-Szmek
This is like the previous reverted commit, but any boolean is still accepted, not just "yes" and "no". Man page is adjusted to match the code.
2016-10-10Revert "nspawn: fix parsing of numeric arguments for --private-users"Zbigniew Jędrzejewski-Szmek
This reverts commit bfd292ec35c7b768f9fb5cff4d921f3133e62b19.
2016-10-09nspawn: fix parsing of numeric arguments for --private-usersZbigniew Jędrzejewski-Szmek
The documentation says lists "yes", "no", "pick", and numeric arguments. But parse_boolean was attempted first, so various numeric arguments were misinterpreted. In particular, this fixes --private-users=0 to mean the same thing as --private-users=0:65536. While at it, use strndupa to avoid some error handling. Also give a better error for an empty UID range. I think it's likely that people will use --private-users=0:0 thinking that the argument means UID:GID.
2016-10-09nspawn: reindent tableZbigniew Jędrzejewski-Szmek
2016-10-08nspawn: also fall back to legacy cgroup hierarchy for old containersZbigniew Jędrzejewski-Szmek
Current systemd version detection routine cannot detect systemd 230, only systmed >= 231. This means that we'll still use the legacy hierarchy in some cases where we wouldn't have too. If somebody figures out a nice way to detect systemd 230 this can be later improved.
2016-10-08nspawn: use mixed cgroup hierarchy only when container has new systemdZbigniew Jędrzejewski-Szmek
systemd-soon-to-be-released-232 is able to deal with the mixed hierarchy. So make an educated guess, and use the mixed hierarchy in that case. Tested by running the host with mixed hierarchy (i.e. simply using a recent kernel with systemd from git), and booting first a container with older systemd, and then one with a newer systemd. Fixes #4008.