summaryrefslogtreecommitdiff
path: root/src/nspawn/nspawn.c
AgeCommit message (Collapse)Author
2016-12-21dissect: make using a generic partition as root partition optionalLennart Poettering
In preparation for reusing the image dissector in the GPT auto-discovery logic, only optionally fail the dissection when we can't identify a root partition. In the GPT auto-discovery we are completely fine with any kind of root, given that we run when it is already mounted and all we do is find some additional auxiliary partitions on the same disk.
2016-12-21nspawn: restore --volatile=yes supportLennart Poettering
This was broken by 19caffac75a2590a0c5ebc2a0214960f8188aec7 which remounted the root directory to MS_SHARED before applying the volatile mount logic. This broke things as MS_MOVE is incompatible with MS_SHARED directory trees, and we need MS_MOVE in the volatile mount logic to rearrange the directory tree. Simply swap the order here, apply the volatile logic before we switch to MS_SHARED.
2016-12-20dissect: optionally, only look for GPT partition tables, nothing elseLennart Poettering
This is useful for reusing the dissector logic in the gpt-auto-discovery logic: there we really don't want to use MBR or naked file systems as root device.
2016-12-14nspawn: flush out environment block of the -a stub init processLennart Poettering
The container detection code in virt.c we ship checks for /proc/1/environ, looking for "container=" in it. Let's make sure our "-a" init stub exposes that correctly. Without this "systemd-detect-virt" run in a "-a" container won't detect that it is being run in a container.
2016-12-13nspawn: when getting SIGCHLD make sure it's from the first child (#4855)Andrey Ulanov
When getting SIGCHLD we should not assume that it was the first child forked from system-nspawn that has died as it may also be coming from an orphan process. This change adds a signal handler that ignores SIGCHLD unless it came from the first containerized child - the real child. Before this change the problem can be reproduced as follows: $ sudo systemd-nspawn --directory=/container-root --share-system Press ^] three times within 1s to kill container. [root@andreyu-coreos ~]# { true & } & [1] 22201 [root@andreyu-coreos ~]# Container root-fedora-latest terminated by signal KILL
2016-12-10Merge pull request #4795 from poettering/dissectZbigniew Jędrzejewski-Szmek
Generalize image dissection logic of nspawn, and make it useful for other tools.
2016-12-10nspawn: add missing -E to getopt_long (#4860)Wim de With
2016-12-07nspawn: resolv.conf might not be created initially (#4799)Franck Bui
This might happen that resolv.conf is missing in a minimal rootfs and in this case the following warning is emitted: Failed to mount n/a on /mnt/etc/resolv.conf (MS_BIND ""): No such file or directory This patch fixes this case.
2016-12-07nspawn/dissect: automatically discover dm-verity verity partitionsLennart Poettering
This adds support for discovering and making use of properly tagged dm-verity data integrity partitions. This extends both systemd-nspawn and systemd-dissect with a new --root-hash= switch that takes the root hash to use for the root partition, and is otherwise fully automatic. Verity partitions are discovered automatically by GPT table type UUIDs, as listed in https://www.freedesktop.org/wiki/Specifications/DiscoverablePartitionsSpec/ (which I updated prior to this change, to include new UUIDs for this purpose. mkosi with https://github.com/systemd/mkosi/pull/39 applied may generate images that carry the necessary integrity data. With that PR and this commit, the following simply lines suffice to boot up an integrity-protected container image: ``` # mkdir test # cd test # mkosi --verity # systemd-nspawn -i ./image.raw -bn ``` Note that mkosi writes the image file to "image.raw" next to a a file "image.roothash" that contains the root hash. systemd-nspawn will look for that file and use it if it exists, in case --root-hash= is not specified explicitly.
2016-12-07nspawn: when generating a machine name from an image name, truncate .raw suffixLennart Poettering
Let's prettify the machine name we generate for image-based containers: let's chop off the .raw suffix before using it as machine name.
2016-12-07dissect: add support for encrypted imagesLennart Poettering
This adds support to the image dissector to deal with encrypted images (only LUKS). Given that we now have a neatly isolated image dissector codebase, let's add a new feature to it: support for automatically dealing with encrypted images. This is then exposed in systemd-dissect and nspawn. It's pretty basic: only support for passphrase-based encryption. In order to ensure that "systemd-dissect --mount" results in mount points whose backing LUKS DM devices are cleaned up automatically we use the DM_DEV_REMOVE ioctl() directly on the device (in DM_DEFERRED_REMOVE mode). libgcryptsetup at the moment doesn't provide a proper API for this. Thankfully, the ioctl() API is pretty easy to use.
2016-12-07nspawn: port nspawn to new generalized image dissection codeLennart Poettering
Let's make use of the new internal API. This mostly doesn't change anything for the caller, however, "systemd-nspawn --image=/dev/sda7" works now as the new code can handle disk images with no partition tables, and make any detected images directly the root.
2016-12-01util-lib: rename CHASE_NON_EXISTING → CHASE_NONEXISTENTLennart Poettering
As suggested by @keszybz
2016-12-01nspawn: permit prefixing of source paths in --bind= and --overlay= with "+"Lennart Poettering
If a source path is prefixed with "+" it is taken relative to the container's root directory instead of the host. This permits easily establishing bind and overlay mounts based on data from the container rather than the host. This also reworks custom_mounts_prepare(), and turns it into two functions: one custom_mount_check_all() that remains in nspawn.c but purely verifies the validity of the custom mounts configured. And one called custom_mount_prepare_all() that actually does the preparation step, sorts the custom mounts, resolves relative paths, and allocates temporary directories as necessary.
2016-12-01tree-wide: set SA_RESTART for signal handlers we installLennart Poettering
We already set it in most cases, but make sure to set it in all others too, and document that that's a good idea.
2016-12-01nspawn: split out overlayfs argument parsing into a function of its ownLennart Poettering
Add overlay_mount_parse() similar in style to tmpfs_mount_parse() and bind_mount_parse().
2016-12-01nspawn: make use of CHASE_NON_EXISTING when locking imageLennart Poettering
If --template= is used on an image, then the image might not exist initially. We can use CHASE_NON_EXISTING to properly lock the image already before it exists. Let's do so.
2016-12-01fs-util: add flags parameter to chase_symlinks()Lennart Poettering
Let's remove chase_symlinks_prefix() and instead introduce a flags parameter to chase_symlinks(), with a flag CHASE_PREFIX_ROOT that exposes the behaviour of chase_symlinks_prefix().
2016-12-01nspawn: accept --ephemeral --template= as alternative for --ephemeral ↵Lennart Poettering
--directory= As suggested in PR #3667. This PR simply ensures that --template= can be used as alternative to --directory= when --ephemeral is used, following the logic that for ephemeral options the source directory is actually a template. This does not deprecate usage of --directory= with --ephemeral, as I am not convinced the old logic wouldn't make sense. Fixes: #3667
2016-12-01nspawn: properly handle image/directory paths that are symlinksLennart Poettering
This resolves any paths specified on --directory=, --template=, and --image= before using them. This makes sure nspawn can be used correctly on symlinked images and directory trees. Fixes: #2001
2016-12-01tree-wide: stop using canonicalize_file_name(), use chase_symlinks() insteadLennart Poettering
Let's use chase_symlinks() everywhere, and stop using GNU canonicalize_file_name() everywhere. For most cases this should not change behaviour, however increase exposure of our function to get better tested. Most importantly in a few cases (most notably nspawn) it can take the correct root directory into account when chasing symlinks.
2016-11-22nspawn: add fallback top normal copy/reflink when we cannot btrfs snapshotLennart Poettering
Given that other file systems (notably: xfs) support reflinks these days, let's extend the file system snapshotting logic to fall back to plan copies or reflinks when full btrfs subvolume snapshots are not available. This essentially makes "systemd-nspawn --ephemeral" and "systemd-nspawn --template=" available on non-btrfs subvolumes. Of course, both operations will still be slower on non-btrfs than on btrfs (simply because reflinking each file individually in a directory tree is still slower than doing this in one step for a whole subvolume), but it's probably good enough for many cases, and we should provide the users with the tools, they have to figure out what's good for them. Note that "machinectl clone" already had a fallback like this in place, this patch generalizes this, and adds similar support to our other cases.
2016-11-22nspawn: remove temporary root directory on exitLennart Poettering
When mountint a loopback image, we need a temporary root directory we can mount stuff to. Make sure to actually remove it when exiting, so that we don't leave stuff around in /tmp unnecessarily. See: #4664
2016-11-22nspawn: try to wait for the container PID 1 to exit, before we exitLennart Poettering
Let's make the shutdown logic synchronous, so that there's a better chance to detach the loopback device after use.
2016-11-22nspawn: support ephemeral boots from imagesLennart Poettering
Previously --ephemeral was only supported with container trees in btrfs subvolumes (i.e. in combination with --directory=). This adds support for --ephemeral in conjunction with disk images (i.e. --image=) too. As side effect this fixes that --ephemeral was accepted but ignored when using -M on a container that turned out to be an image. Fixes: #4664
2016-11-18Merge pull request #4395 from s-urbaniak/rw-supportLennart Poettering
nspawn: R/W support for /sysfs, /proc, and /proc/sys/net
2016-11-18nspawn: R/W support for /sys, and /proc/sysSergiusz Urbaniak
This commit adds the possibility to leave /sys, and /proc/sys read-write. It introduces a new (undocumented) env var SYSTEMD_NSPAWN_API_VFS_WRITABLE to enable this feature. If set to "yes", /sys, and /proc/sys will be read-write. If set to "no", /sys, and /proc/sys will be read-only. If set to "network" /proc/sys/net will be read-write. This is useful in use-cases, where systemd-nspawn is used in an external network namespace. This adds the possibility to start privileged containers which need more control over settings in the /proc, and /sys filesystem. This is also a follow-up on the discussion from https://github.com/systemd/systemd/pull/4018#r76971862 where an introduction of a simple env var to enable R/W support for those directories was already discussed.
2016-11-14nspawn: restart the whole systemd-nspawn@.service unit on container reboot ↵Zbigniew Jędrzejewski-Szmek
(#4613) Since 133 is now used in a few places, add a #define for it. Also make the status message a bit informative. Another issue introduced in b006762. The logic was borked, we were supposed to return 0 to break the loop, and 133 to restart the container, not the other way around. But this doesn't seem to work, reboot fails with: Nov 08 00:41:32 laptop systemd-nspawn[26564]: Failed to register machine: Machine 'fedora-rawhide' already exists So actually the version before this patch worked better, since 133 > 0 and we'd at least loop internally.
2016-11-08nspawn: fix condition for mounting resolv.conf (#4622)Christian Hesse
The file /usr/lib/systemd/resolv.conf can be stale, it does not tell us whether or not systemd-resolved is running or not. So check for /run/systemd/resolve/resolv.conf as well, which is created at runtime and hence is a better indication.
2016-11-08Merge pull request #4612 from keszybz/format-stringsZbigniew Jędrzejewski-Szmek
Format string tweaks (and a small fix on 32bit)
2016-11-07nspawn: fix exit code for --help and --version (#4609)Martin Pitt
Commit b006762 inverted the initial exit code which is relevant for --help and --version without a particular reason. For these special options, parse_argv() returns 0 so that our main() immediately skips to the end without adjusting "ret". Otherwise, if an actual container is being started, ret is set on error in run(), which still provides the "non-zero exit on error" behaviour. Fixes #4605.
2016-11-07Rename formats-util.h to format-util.hZbigniew Jędrzejewski-Szmek
We don't have plural in the name of any other -util files and this inconsistency trips me up every time I try to type this file name from memory. "formats-util" is even hard to pronounce.
2016-11-03Merge pull request #4510 from keszybz/tree-wide-cleanupsLennart Poettering
Tree wide cleanups
2016-11-02nspawn: if we set up a loopback device, try to mount it with "discard"Lennart Poettering
Let's make sure that our loopback files remain sparse, hence let's set "discard" as mount option on file systems that support it if the backing device is a loopback.
2016-10-23nspawn: become a new root earlyEvgeny Vereshchagin
https://github.com/torvalds/linux/commit/036d523641c66bef713042894a17f4335f199e49 > vfs: Don't create inodes with a uid or gid unknown to the vfs It is expected that filesystems can not represent uids and gids from outside of their user namespace. Keep things simple by not even trying to create filesystem nodes with non-sense uids and gids. So, we actually should `reset_uid_gid` early to prevent https://github.com/systemd/systemd/pull/4223#issuecomment-252522955 $ sudo UNIFIED_CGROUP_HIERARCHY=no LD_LIBRARY_PATH=.libs .libs/systemd-nspawn -D /var/lib/machines/fedora-rawhide -U -b systemd.unit=multi-user.target Spawning container fedora-rawhide on /var/lib/machines/fedora-rawhide. Press ^] three times within 1s to kill container. Child died too early. Selected user namespace base 1073283072 and range 65536. Failed to mount to /sys/fs/cgroup/systemd: No such file or directory Details: https://github.com/systemd/systemd/pull/4223#issuecomment-253046519 Fixes: #4352
2016-10-23tree-wide: drop NULL sentinel from strjoinZbigniew Jędrzejewski-Szmek
This makes strjoin and strjoina more similar and avoids the useless final argument. spatch -I . -I ./src -I ./src/basic -I ./src/basic -I ./src/shared -I ./src/shared -I ./src/network -I ./src/locale -I ./src/login -I ./src/journal -I ./src/journal -I ./src/timedate -I ./src/timesync -I ./src/nspawn -I ./src/resolve -I ./src/resolve -I ./src/systemd -I ./src/core -I ./src/core -I ./src/libudev -I ./src/udev -I ./src/udev/net -I ./src/udev -I ./src/libsystemd/sd-bus -I ./src/libsystemd/sd-event -I ./src/libsystemd/sd-login -I ./src/libsystemd/sd-netlink -I ./src/libsystemd/sd-network -I ./src/libsystemd/sd-hwdb -I ./src/libsystemd/sd-device -I ./src/libsystemd/sd-id128 -I ./src/libsystemd-network --sp-file coccinelle/strjoin.cocci --in-place $(git ls-files src/*.c) git grep -e '\bstrjoin\b.*NULL' -l|xargs sed -i -r 's/strjoin\((.*), NULL\)/strjoin(\1)/' This might have missed a few cases (spatch has a really hard time dealing with _cleanup_ macros), but that's no big issue, they can always be fixed later.
2016-10-21nspawn, NEWS: add missing "s" in --private-users-chown (#4438)Zbigniew Jędrzejewski-Szmek
2016-10-13nspawn: cleanup and chown the synced cgroup hierarchy (#4223)Evgeny Vereshchagin
Fixes: #4181
2016-10-11nspawn,mount-util: add [u]mount_verbose and use it in nspawnZbigniew Jędrzejewski-Szmek
This makes it easier to debug failed nspawn invocations: Mounting sysfs on /var/lib/machines/fedora-rawhide/sys (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV "")... Mounting tmpfs on /var/lib/machines/fedora-rawhide/dev (MS_NOSUID|MS_STRICTATIME "mode=755,uid=1450901504,gid=1450901504")... Mounting tmpfs on /var/lib/machines/fedora-rawhide/dev/shm (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=1777,uid=1450901504,gid=1450901504")... Mounting tmpfs on /var/lib/machines/fedora-rawhide/run (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=755,uid=1450901504,gid=1450901504")... Bind-mounting /sys/fs/selinux on /var/lib/machines/fedora-rawhide/sys/fs/selinux (MS_BIND "")... Remounting /var/lib/machines/fedora-rawhide/sys/fs/selinux (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_BIND|MS_REMOUNT "")... Mounting proc on /proc (MS_NOSUID|MS_NOEXEC|MS_NODEV "")... Bind-mounting /proc/sys on /proc/sys (MS_BIND "")... Remounting /proc/sys (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_BIND|MS_REMOUNT "")... Bind-mounting /proc/sysrq-trigger on /proc/sysrq-trigger (MS_BIND "")... Remounting /proc/sysrq-trigger (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_BIND|MS_REMOUNT "")... Mounting tmpfs on /tmp (MS_STRICTATIME "mode=1777,uid=0,gid=0")... Mounting tmpfs on /sys/fs/cgroup (MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_STRICTATIME "mode=755,uid=0,gid=0")... Mounting cgroup on /sys/fs/cgroup/systemd (MS_NOSUID|MS_NOEXEC|MS_NODEV "none,name=systemd,xattr")... Failed to mount cgroup on /sys/fs/cgroup/systemd (MS_NOSUID|MS_NOEXEC|MS_NODEV "none,name=systemd,xattr"): No such file or directory
2016-10-11nspawn: simplify arg_us_cgns passingZbigniew Jędrzejewski-Szmek
We would check the condition cg_ns_supported() twice. No functional change.
2016-10-10Merge pull request #4332 from keszybz/nspawn-arguments-3Lennart Poettering
nspawn --private-users parsing, v2
2016-10-10Merge pull request #4310 from keszybz/nspawn-autodetectEvgeny Vereshchagin
Autodetect systemd version in containers started by systemd-nspawn
2016-10-10nspawn: better error messages for parsing errorsZbigniew Jędrzejewski-Szmek
In particular, the check for arg_uid_range <= 0 is moved to the end, so that "foobar:0" gives "Failed to parse UID", and not "UID range cannot be 0.".
2016-10-10nspawn,man: fix parsing of numeric args for --private-users, accept any booleanZbigniew Jędrzejewski-Szmek
This is like the previous reverted commit, but any boolean is still accepted, not just "yes" and "no". Man page is adjusted to match the code.
2016-10-10Revert "nspawn: fix parsing of numeric arguments for --private-users"Zbigniew Jędrzejewski-Szmek
This reverts commit bfd292ec35c7b768f9fb5cff4d921f3133e62b19.
2016-10-09nspawn: fix parsing of numeric arguments for --private-usersZbigniew Jędrzejewski-Szmek
The documentation says lists "yes", "no", "pick", and numeric arguments. But parse_boolean was attempted first, so various numeric arguments were misinterpreted. In particular, this fixes --private-users=0 to mean the same thing as --private-users=0:65536. While at it, use strndupa to avoid some error handling. Also give a better error for an empty UID range. I think it's likely that people will use --private-users=0:0 thinking that the argument means UID:GID.
2016-10-09nspawn: reindent tableZbigniew Jędrzejewski-Szmek
2016-10-08nspawn: also fall back to legacy cgroup hierarchy for old containersZbigniew Jędrzejewski-Szmek
Current systemd version detection routine cannot detect systemd 230, only systmed >= 231. This means that we'll still use the legacy hierarchy in some cases where we wouldn't have too. If somebody figures out a nice way to detect systemd 230 this can be later improved.
2016-10-08nspawn: use mixed cgroup hierarchy only when container has new systemdZbigniew Jędrzejewski-Szmek
systemd-soon-to-be-released-232 is able to deal with the mixed hierarchy. So make an educated guess, and use the mixed hierarchy in that case. Tested by running the host with mixed hierarchy (i.e. simply using a recent kernel with systemd from git), and booting first a container with older systemd, and then one with a newer systemd. Fixes #4008.
2016-10-08nspawn: fix spurious reboot if container process returns 133Zbigniew Jędrzejewski-Szmek