summaryrefslogtreecommitdiff
path: root/src/nspawn
AgeCommit message (Collapse)Author
2017-01-10build-sys: add check for gperf lookup function signature (#5055)Mike Gilbert
gperf-3.1 generates lookup functions that take a size_t length parameter instead of unsigned int. Test for this at configure time. Fixes: https://github.com/systemd/systemd/issues/5039
2016-12-29nspawn: reword notice when /dev is pre-mounted and populated (#4971)Lennart Poettering
Fixes: #4676
2016-12-21nspawn: unref the notify event source (#4941)Evgeny Vereshchagin
Fixes: ``` sudo ./libtool --mode=execute valgrind --leak-check=full ./systemd-nspawn -D ./CONT/ -b ... ==21224== 2,444 (656 direct, 1,788 indirect) bytes in 1 blocks are definitely lost in loss record 13 of 15 ==21224== at 0x4C2FA50: calloc (vg_replace_malloc.c:711) ==21224== by 0x4F6F565: sd_event_new (sd-event.c:431) ==21224== by 0x1210BE: run (nspawn.c:3351) ==21224== by 0x123908: main (nspawn.c:3826) ==21224== ==21224== LEAK SUMMARY: ==21224== definitely lost: 656 bytes in 1 blocks ==21224== indirectly lost: 1,788 bytes in 11 blocks ==21224== possibly lost: 0 bytes in 0 blocks ==21224== still reachable: 8,344 bytes in 3 blocks ==21224== suppressed: 0 bytes in 0 blocks ``` Closes #4934
2016-12-14nspawn: flush out environment block of the -a stub init processLennart Poettering
The container detection code in virt.c we ship checks for /proc/1/environ, looking for "container=" in it. Let's make sure our "-a" init stub exposes that correctly. Without this "systemd-detect-virt" run in a "-a" container won't detect that it is being run in a container.
2016-12-13nspawn: when getting SIGCHLD make sure it's from the first child (#4855)Andrey Ulanov
When getting SIGCHLD we should not assume that it was the first child forked from system-nspawn that has died as it may also be coming from an orphan process. This change adds a signal handler that ignores SIGCHLD unless it came from the first containerized child - the real child. Before this change the problem can be reproduced as follows: $ sudo systemd-nspawn --directory=/container-root --share-system Press ^] three times within 1s to kill container. [root@andreyu-coreos ~]# { true & } & [1] 22201 [root@andreyu-coreos ~]# Container root-fedora-latest terminated by signal KILL
2016-12-10Merge pull request #4795 from poettering/dissectZbigniew Jędrzejewski-Szmek
Generalize image dissection logic of nspawn, and make it useful for other tools.
2016-12-10nspawn: add missing -E to getopt_long (#4860)Wim de With
2016-12-07nspawn: resolv.conf might not be created initially (#4799)Franck Bui
This might happen that resolv.conf is missing in a minimal rootfs and in this case the following warning is emitted: Failed to mount n/a on /mnt/etc/resolv.conf (MS_BIND ""): No such file or directory This patch fixes this case.
2016-12-07nspawn/dissect: automatically discover dm-verity verity partitionsLennart Poettering
This adds support for discovering and making use of properly tagged dm-verity data integrity partitions. This extends both systemd-nspawn and systemd-dissect with a new --root-hash= switch that takes the root hash to use for the root partition, and is otherwise fully automatic. Verity partitions are discovered automatically by GPT table type UUIDs, as listed in https://www.freedesktop.org/wiki/Specifications/DiscoverablePartitionsSpec/ (which I updated prior to this change, to include new UUIDs for this purpose. mkosi with https://github.com/systemd/mkosi/pull/39 applied may generate images that carry the necessary integrity data. With that PR and this commit, the following simply lines suffice to boot up an integrity-protected container image: ``` # mkdir test # cd test # mkosi --verity # systemd-nspawn -i ./image.raw -bn ``` Note that mkosi writes the image file to "image.raw" next to a a file "image.roothash" that contains the root hash. systemd-nspawn will look for that file and use it if it exists, in case --root-hash= is not specified explicitly.
2016-12-07nspawn: when generating a machine name from an image name, truncate .raw suffixLennart Poettering
Let's prettify the machine name we generate for image-based containers: let's chop off the .raw suffix before using it as machine name.
2016-12-07dissect: add support for encrypted imagesLennart Poettering
This adds support to the image dissector to deal with encrypted images (only LUKS). Given that we now have a neatly isolated image dissector codebase, let's add a new feature to it: support for automatically dealing with encrypted images. This is then exposed in systemd-dissect and nspawn. It's pretty basic: only support for passphrase-based encryption. In order to ensure that "systemd-dissect --mount" results in mount points whose backing LUKS DM devices are cleaned up automatically we use the DM_DEV_REMOVE ioctl() directly on the device (in DM_DEFERRED_REMOVE mode). libgcryptsetup at the moment doesn't provide a proper API for this. Thankfully, the ioctl() API is pretty easy to use.
2016-12-07nspawn: port nspawn to new generalized image dissection codeLennart Poettering
Let's make use of the new internal API. This mostly doesn't change anything for the caller, however, "systemd-nspawn --image=/dev/sda7" works now as the new code can handle disk images with no partition tables, and make any detected images directly the root.
2016-12-06core: introduce parse_ip_port (#4825)Susant Sahani
1. Listed in TODO. 2. Tree wide replace safe_atou16 with parse_ip_port incase it's used for ports.
2016-12-05nspawn: don't hide --bind=/tmp/* mounts (#4824)Evgeny Vereshchagin
Fixes #4789
2016-12-01util-lib: rename CHASE_NON_EXISTING → CHASE_NONEXISTENTLennart Poettering
As suggested by @keszybz
2016-12-01nspawn: improve log messagesLennart Poettering
When complaining about the inability to resolve a path, show the full path, not just the relative one. As suggested by @keszybz.
2016-12-01nspawn: optionally, automatically allocated --bind=/--overlay source from ↵Lennart Poettering
/var/tmp This extends the --bind= and --overlay= syntax so that an empty string as source/upper directory is taken as request to automatically allocate a temporary directory below /var/tmp, whose lifetime is bound to the nspawn runtime. In combination with the "+" path extension this permits a switch "--overlay=+/var::/var" in order to use the container's shipped /var, combine it with a writable temporary directory and mount it to the runtime /var of the container.
2016-12-01nspawn: permit prefixing of source paths in --bind= and --overlay= with "+"Lennart Poettering
If a source path is prefixed with "+" it is taken relative to the container's root directory instead of the host. This permits easily establishing bind and overlay mounts based on data from the container rather than the host. This also reworks custom_mounts_prepare(), and turns it into two functions: one custom_mount_check_all() that remains in nspawn.c but purely verifies the validity of the custom mounts configured. And one called custom_mount_prepare_all() that actually does the preparation step, sorts the custom mounts, resolves relative paths, and allocates temporary directories as necessary.
2016-12-01tree-wide: set SA_RESTART for signal handlers we installLennart Poettering
We already set it in most cases, but make sure to set it in all others too, and document that that's a good idea.
2016-12-01nspawn: add ability to configure overlay mounts to .nspawn filesLennart Poettering
Fixes: #4634
2016-12-01nspawn: split out overlayfs argument parsing into a function of its ownLennart Poettering
Add overlay_mount_parse() similar in style to tmpfs_mount_parse() and bind_mount_parse().
2016-12-01nspawn: use -ENOMEM instead of log_oom() in one caseLennart Poettering
The function is of the "library" kind and doesn't log ENOMEM in all other cases, hence fix the one outlier.
2016-12-01nspawn: make use of CHASE_NON_EXISTING when locking imageLennart Poettering
If --template= is used on an image, then the image might not exist initially. We can use CHASE_NON_EXISTING to properly lock the image already before it exists. Let's do so.
2016-12-01nspawn: use the new CHASE_NON_EXISTING flag when resolving mount pointsLennart Poettering
This restores the ability to implicitly create files/directories to mount specified mount points on.
2016-12-01fs-util: add flags parameter to chase_symlinks()Lennart Poettering
Let's remove chase_symlinks_prefix() and instead introduce a flags parameter to chase_symlinks(), with a flag CHASE_PREFIX_ROOT that exposes the behaviour of chase_symlinks_prefix().
2016-12-01nspawn: use chase_symlinks() on all paths specified via --tmpfs=, --bind= ↵Lennart Poettering
and so on Fixes: #2860
2016-12-01nspawn: coding style: don't mix variable declarations and function callsLennart Poettering
2016-12-01nspawn: use realloc_multiply() where it makes senseLennart Poettering
2016-12-01nspawn: accept --ephemeral --template= as alternative for --ephemeral ↵Lennart Poettering
--directory= As suggested in PR #3667. This PR simply ensures that --template= can be used as alternative to --directory= when --ephemeral is used, following the logic that for ephemeral options the source directory is actually a template. This does not deprecate usage of --directory= with --ephemeral, as I am not convinced the old logic wouldn't make sense. Fixes: #3667
2016-12-01nspawn: properly handle image/directory paths that are symlinksLennart Poettering
This resolves any paths specified on --directory=, --template=, and --image= before using them. This makes sure nspawn can be used correctly on symlinked images and directory trees. Fixes: #2001
2016-12-01tree-wide: stop using canonicalize_file_name(), use chase_symlinks() insteadLennart Poettering
Let's use chase_symlinks() everywhere, and stop using GNU canonicalize_file_name() everywhere. For most cases this should not change behaviour, however increase exposure of our function to get better tested. Most importantly in a few cases (most notably nspawn) it can take the correct root directory into account when chasing symlinks.
2016-11-22nspawn: don't require chown() if userns is not onLennart Poettering
Fixes: #4711
2016-11-22nspawn: add fallback top normal copy/reflink when we cannot btrfs snapshotLennart Poettering
Given that other file systems (notably: xfs) support reflinks these days, let's extend the file system snapshotting logic to fall back to plan copies or reflinks when full btrfs subvolume snapshots are not available. This essentially makes "systemd-nspawn --ephemeral" and "systemd-nspawn --template=" available on non-btrfs subvolumes. Of course, both operations will still be slower on non-btrfs than on btrfs (simply because reflinking each file individually in a directory tree is still slower than doing this in one step for a whole subvolume), but it's probably good enough for many cases, and we should provide the users with the tools, they have to figure out what's good for them. Note that "machinectl clone" already had a fallback like this in place, this patch generalizes this, and adds similar support to our other cases.
2016-11-22nspawn: remove temporary root directory on exitLennart Poettering
When mountint a loopback image, we need a temporary root directory we can mount stuff to. Make sure to actually remove it when exiting, so that we don't leave stuff around in /tmp unnecessarily. See: #4664
2016-11-22nspawn: try to wait for the container PID 1 to exit, before we exitLennart Poettering
Let's make the shutdown logic synchronous, so that there's a better chance to detach the loopback device after use.
2016-11-22nspawn: support ephemeral boots from imagesLennart Poettering
Previously --ephemeral was only supported with container trees in btrfs subvolumes (i.e. in combination with --directory=). This adds support for --ephemeral in conjunction with disk images (i.e. --image=) too. As side effect this fixes that --ephemeral was accepted but ignored when using -M on a container that turned out to be an image. Fixes: #4664
2016-11-18Merge pull request #4395 from s-urbaniak/rw-supportLennart Poettering
nspawn: R/W support for /sysfs, /proc, and /proc/sys/net
2016-11-18nspawn: R/W support for /sys, and /proc/sysSergiusz Urbaniak
This commit adds the possibility to leave /sys, and /proc/sys read-write. It introduces a new (undocumented) env var SYSTEMD_NSPAWN_API_VFS_WRITABLE to enable this feature. If set to "yes", /sys, and /proc/sys will be read-write. If set to "no", /sys, and /proc/sys will be read-only. If set to "network" /proc/sys/net will be read-write. This is useful in use-cases, where systemd-nspawn is used in an external network namespace. This adds the possibility to start privileged containers which need more control over settings in the /proc, and /sys filesystem. This is also a follow-up on the discussion from https://github.com/systemd/systemd/pull/4018#r76971862 where an introduction of a simple env var to enable R/W support for those directories was already discussed.
2016-11-14nspawn: restart the whole systemd-nspawn@.service unit on container reboot ↵Zbigniew Jędrzejewski-Szmek
(#4613) Since 133 is now used in a few places, add a #define for it. Also make the status message a bit informative. Another issue introduced in b006762. The logic was borked, we were supposed to return 0 to break the loop, and 133 to restart the container, not the other way around. But this doesn't seem to work, reboot fails with: Nov 08 00:41:32 laptop systemd-nspawn[26564]: Failed to register machine: Machine 'fedora-rawhide' already exists So actually the version before this patch worked better, since 133 > 0 and we'd at least loop internally.
2016-11-08nspawn: fix condition for mounting resolv.conf (#4622)Christian Hesse
The file /usr/lib/systemd/resolv.conf can be stale, it does not tell us whether or not systemd-resolved is running or not. So check for /run/systemd/resolve/resolv.conf as well, which is created at runtime and hence is a better indication.
2016-11-08Merge pull request #4612 from keszybz/format-stringsZbigniew Jędrzejewski-Szmek
Format string tweaks (and a small fix on 32bit)
2016-11-07nspawn: fix exit code for --help and --version (#4609)Martin Pitt
Commit b006762 inverted the initial exit code which is relevant for --help and --version without a particular reason. For these special options, parse_argv() returns 0 so that our main() immediately skips to the end without adjusting "ret". Otherwise, if an actual container is being started, ret is set on error in run(), which still provides the "non-zero exit on error" behaviour. Fixes #4605.
2016-11-07Rename formats-util.h to format-util.hZbigniew Jędrzejewski-Szmek
We don't have plural in the name of any other -util files and this inconsistency trips me up every time I try to type this file name from memory. "formats-util" is even hard to pronounce.
2016-11-07systemd-nspawn: decrease non-fatal mount errors to debug level (#4569)tblume
non-fatal mount errors shouldn't be logged as warnings.
2016-11-03Merge pull request #4510 from keszybz/tree-wide-cleanupsLennart Poettering
Tree wide cleanups
2016-11-02nspawn: if we set up a loopback device, try to mount it with "discard"Lennart Poettering
Let's make sure that our loopback files remain sparse, hence let's set "discard" as mount option on file systems that support it if the backing device is a loopback.
2016-10-24seccomp: add new seccomp_init_conservative() helperLennart Poettering
This adds a new seccomp_init_conservative() helper call that is mostly just a wrapper around seccomp_init(), but turns off NNP and adds in all secondary archs, for best compatibility with everything else. Pretty much all of our code used the very same constructs for these three steps, hence unifying this in one small function makes things a lot shorter. This also changes incorrect usage of the "scmp_filter_ctx" type at various places. libseccomp defines it as typedef to "void*", i.e. it is a pointer type (pretty poor choice already!) that casts implicitly to and from all other pointer types (even poorer choice: you defined a confusing type now, and don't even gain any bit of type safety through it...). A lot of the code assumed the type would refer to a structure, and hence aded additional "*" here and there. Remove that.
2016-10-23nspawn: become a new root earlyEvgeny Vereshchagin
https://github.com/torvalds/linux/commit/036d523641c66bef713042894a17f4335f199e49 > vfs: Don't create inodes with a uid or gid unknown to the vfs It is expected that filesystems can not represent uids and gids from outside of their user namespace. Keep things simple by not even trying to create filesystem nodes with non-sense uids and gids. So, we actually should `reset_uid_gid` early to prevent https://github.com/systemd/systemd/pull/4223#issuecomment-252522955 $ sudo UNIFIED_CGROUP_HIERARCHY=no LD_LIBRARY_PATH=.libs .libs/systemd-nspawn -D /var/lib/machines/fedora-rawhide -U -b systemd.unit=multi-user.target Spawning container fedora-rawhide on /var/lib/machines/fedora-rawhide. Press ^] three times within 1s to kill container. Child died too early. Selected user namespace base 1073283072 and range 65536. Failed to mount to /sys/fs/cgroup/systemd: No such file or directory Details: https://github.com/systemd/systemd/pull/4223#issuecomment-253046519 Fixes: #4352
2016-10-23nspawn: really lchown(uid/gid)Evgeny Vereshchagin
https://github.com/systemd/systemd/pull/4372#issuecomment-253723849: * `mount_all (outer_child)` creates `container_dir/sys/fs/selinux` * `mount_all (outer_child)` doesn't patch `container_dir/sys/fs` and so on. * `mount_sysfs (inner_child)` tries to create `/sys/fs/cgroup` * This fails 370 stat("/sys/fs", {st_dev=makedev(0, 28), st_ino=13880, st_mode=S_IFDIR|0755, st_nlink=3, st_uid=65534, st_gid=65534, st_blksize=4096, st_blocks=0, st_size=60, st_atime=2016/10/14-05:16:43.398665943, st_mtime=2016/10/14-05:16:43.399665943, st_ctime=2016/10/14-05:16:43.399665943}) = 0 370 mkdir("/sys/fs/cgroup", 0755) = -1 EACCES (Permission denied) * `mount_syfs (inner_child)` ignores that error and mount(NULL, "/sys", NULL, MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0 * `mount_cgroups` finally fails
2016-10-23nspawn: use the return value from asprintf instead of checking the pointerZbigniew Jędrzejewski-Szmek
If allocation fails, the value of the point is "undefined". In practice this matters very little, but for consistency with rest of the code, let's check the return value.