Age | Commit message (Collapse) | Author |
|
Add a new --pivot-root argument to systemd-nspawn, which specifies a
directory to pivot to / inside the container; while the original / is
pivoted to another specified directory (if provided). This adds
support for booting container images which may contain several bootable
sysroots, as is common with OSTree disk images. When these disk images
are booted on real hardware, ostree-prepare-root is run in conjunction
with sysroot.mount in the initramfs to achieve the same results.
|
|
There's no point in updating exec_target for each binary we try to
execute, if we override it right-away anyway... Let's just do this once,
and include all binaries we try each time.
Follow-up for 1a68e1e543fd8f899503bec00585a16ada296ef7.
|
|
We use different idioms at different places. Let's replace this is the
one true new idiom, that is even a bit faster...
|
|
The failure message is typically currently:
execv() failed: No such file or directory
which is not very useful because it doesn’t tell you which file or
directory it was trying to exec.
|
|
gcc 7 adds -Wimplicit-fallthrough=3 to -Wextra. There are a few ways
we could deal with that. After we take into account the need to stay compatible
with older versions of the compiler (and other compilers), I don't think adding
__attribute__((fallthrough)), even as a macro, is worth the trouble. It sticks
out too much, a comment is just as good. But gcc has some very specific
requiremnts how the comment should look. Adjust it the specific form that it
likes. I don't think the extra stuff we had in those comments was adding much
value.
(Note: the documentation seems to be wrong, and seems to describe a different
pattern from the one that is actually used. I guess either the docs or the code
will have to change before gcc 7 is finalized.)
|
|
First bug fixed by gcc 7. Yikes.
|
|
|
|
nspawn: change owner/group of /run/systemd/nspawn/notify to userns-root
|
|
|
|
This substantially reworks the seccomp code, to ensure better
compatibility with some architectures, including i386.
So far we relied on libseccomp's internal handling of the multiple
syscall ABIs supported on Linux. This is problematic however, as it does
not define clear semantics if an ABI is not able to support specific
seccomp rules we install.
This rework hence changes a couple of things:
- We no longer use seccomp_rule_add(), but only
seccomp_rule_add_exact(), and fail the installation of a filter if the
architecture doesn't support it.
- We no longer rely on adding multiple syscall architectures to a single filter,
but instead install a separate filter for each syscall architecture
supported. This way, we can install a strict filter for x86-64, while
permitting a less strict filter for i386.
- All high-level filter additions are now moved from execute.c to
seccomp-util.c, so that we can test them independently of the service
execution logic.
- Tests have been added for all types of our seccomp filters.
- SystemCallFilters= and SystemCallArchitectures= are now implemented in
independent filters and installation logic, as they semantically are
very much independent of each other.
Fixes: #4575
|
|
Fixes #4944
|
|
CID #1368262: fn is allocated with new, so it should be freed.
|
|
|
|
gperf-3.1 generates lookup functions that take a size_t length
parameter instead of unsigned int. Test for this at configure time.
Fixes: https://github.com/systemd/systemd/issues/5039
|
|
Fixes: #4676
|
|
Handle properly if /etc is a symlink (i.e. make sure we don't follow the
symlink outside the image). Also follow /etc/resolv.conf if it is a
symlink, and use the resolved path when creating a mount point and
mounting (as both of these operations follow symlinks and rally
shouldn't).
Handle more types of read-only errors as debug-level issues.
|
|
There's nothing we can do about it, hence don't complain.
|
|
In preparation for reusing the image dissector in the GPT auto-discovery
logic, only optionally fail the dissection when we can't identify a root
partition.
In the GPT auto-discovery we are completely fine with any kind of root,
given that we run when it is already mounted and all we do is find some
additional auxiliary partitions on the same disk.
|
|
This was broken by 19caffac75a2590a0c5ebc2a0214960f8188aec7 which remounted the
root directory to MS_SHARED before applying the volatile mount logic. This
broke things as MS_MOVE is incompatible with MS_SHARED directory trees, and we
need MS_MOVE in the volatile mount logic to rearrange the directory tree.
Simply swap the order here, apply the volatile logic before we switch to
MS_SHARED.
|
|
Fixes:
```
sudo ./libtool --mode=execute valgrind --leak-check=full ./systemd-nspawn -D ./CONT/ -b
...
==21224== 2,444 (656 direct, 1,788 indirect) bytes in 1 blocks are definitely lost in loss record 13 of 15
==21224== at 0x4C2FA50: calloc (vg_replace_malloc.c:711)
==21224== by 0x4F6F565: sd_event_new (sd-event.c:431)
==21224== by 0x1210BE: run (nspawn.c:3351)
==21224== by 0x123908: main (nspawn.c:3826)
==21224==
==21224== LEAK SUMMARY:
==21224== definitely lost: 656 bytes in 1 blocks
==21224== indirectly lost: 1,788 bytes in 11 blocks
==21224== possibly lost: 0 bytes in 0 blocks
==21224== still reachable: 8,344 bytes in 3 blocks
==21224== suppressed: 0 bytes in 0 blocks
```
Closes #4934
|
|
This is useful for reusing the dissector logic in the gpt-auto-discovery logic:
there we really don't want to use MBR or naked file systems as root device.
|
|
This moves the VolatileMode enum and its helper functions to src/shared/. This
is useful to then reuse them to implement systemd.volatile= in a later commit.
|
|
The container detection code in virt.c we ship checks for /proc/1/environ,
looking for "container=" in it. Let's make sure our "-a" init stub exposes that
correctly.
Without this "systemd-detect-virt" run in a "-a" container won't detect that it
is being run in a container.
|
|
When getting SIGCHLD we should not assume that it was the first
child forked from system-nspawn that has died as it may also be coming
from an orphan process. This change adds a signal handler that ignores
SIGCHLD unless it came from the first containerized child - the real
child.
Before this change the problem can be reproduced as follows:
$ sudo systemd-nspawn --directory=/container-root --share-system
Press ^] three times within 1s to kill container.
[root@andreyu-coreos ~]# { true & } &
[1] 22201
[root@andreyu-coreos ~]#
Container root-fedora-latest terminated by signal KILL
|
|
Generalize image dissection logic of nspawn, and make it useful for other tools.
|
|
|
|
This might happen that resolv.conf is missing in a minimal rootfs and in this
case the following warning is emitted:
Failed to mount n/a on /mnt/etc/resolv.conf (MS_BIND ""): No such file or directory
This patch fixes this case.
|
|
This adds support for discovering and making use of properly tagged dm-verity
data integrity partitions. This extends both systemd-nspawn and systemd-dissect
with a new --root-hash= switch that takes the root hash to use for the root
partition, and is otherwise fully automatic.
Verity partitions are discovered automatically by GPT table type UUIDs, as
listed in
https://www.freedesktop.org/wiki/Specifications/DiscoverablePartitionsSpec/
(which I updated prior to this change, to include new UUIDs for this purpose.
mkosi with https://github.com/systemd/mkosi/pull/39 applied may generate images
that carry the necessary integrity data. With that PR and this commit, the
following simply lines suffice to boot up an integrity-protected container image:
```
# mkdir test
# cd test
# mkosi --verity
# systemd-nspawn -i ./image.raw -bn
```
Note that mkosi writes the image file to "image.raw" next to a a file
"image.roothash" that contains the root hash. systemd-nspawn will look for that
file and use it if it exists, in case --root-hash= is not specified explicitly.
|
|
Let's prettify the machine name we generate for image-based containers: let's
chop off the .raw suffix before using it as machine name.
|
|
This adds support to the image dissector to deal with encrypted images (only
LUKS). Given that we now have a neatly isolated image dissector codebase, let's
add a new feature to it: support for automatically dealing with encrypted
images. This is then exposed in systemd-dissect and nspawn.
It's pretty basic: only support for passphrase-based encryption.
In order to ensure that "systemd-dissect --mount" results in mount points whose
backing LUKS DM devices are cleaned up automatically we use the DM_DEV_REMOVE
ioctl() directly on the device (in DM_DEFERRED_REMOVE mode). libgcryptsetup at
the moment doesn't provide a proper API for this. Thankfully, the ioctl() API
is pretty easy to use.
|
|
Let's make use of the new internal API. This mostly doesn't change anything for
the caller, however, "systemd-nspawn --image=/dev/sda7" works now as the new
code can handle disk images with no partition tables, and make any detected
images directly the root.
|
|
1. Listed in TODO.
2. Tree wide replace safe_atou16 with parse_ip_port incase
it's used for ports.
|
|
Fixes #4789
|
|
As suggested by @keszybz
|
|
When complaining about the inability to resolve a path, show the full path, not
just the relative one.
As suggested by @keszybz.
|
|
/var/tmp
This extends the --bind= and --overlay= syntax so that an empty string as source/upper
directory is taken as request to automatically allocate a temporary directory
below /var/tmp, whose lifetime is bound to the nspawn runtime. In combination
with the "+" path extension this permits a switch "--overlay=+/var::/var" in
order to use the container's shipped /var, combine it with a writable temporary
directory and mount it to the runtime /var of the container.
|
|
If a source path is prefixed with "+" it is taken relative to the container's
root directory instead of the host. This permits easily establishing bind and
overlay mounts based on data from the container rather than the host.
This also reworks custom_mounts_prepare(), and turns it into two functions: one
custom_mount_check_all() that remains in nspawn.c but purely verifies the
validity of the custom mounts configured. And one called
custom_mount_prepare_all() that actually does the preparation step, sorts the
custom mounts, resolves relative paths, and allocates temporary directories as
necessary.
|
|
We already set it in most cases, but make sure to set it in all others too, and
document that that's a good idea.
|
|
Fixes: #4634
|
|
Add overlay_mount_parse() similar in style to tmpfs_mount_parse() and
bind_mount_parse().
|
|
The function is of the "library" kind and doesn't log ENOMEM in all other
cases, hence fix the one outlier.
|
|
If --template= is used on an image, then the image might not exist initially.
We can use CHASE_NON_EXISTING to properly lock the image already before it
exists. Let's do so.
|
|
This restores the ability to implicitly create files/directories to mount
specified mount points on.
|
|
Let's remove chase_symlinks_prefix() and instead introduce a flags parameter to
chase_symlinks(), with a flag CHASE_PREFIX_ROOT that exposes the behaviour of
chase_symlinks_prefix().
|
|
and so on
Fixes: #2860
|
|
|
|
|
|
--directory=
As suggested in PR #3667.
This PR simply ensures that --template= can be used as alternative to
--directory= when --ephemeral is used, following the logic that for ephemeral
options the source directory is actually a template.
This does not deprecate usage of --directory= with --ephemeral, as I am not
convinced the old logic wouldn't make sense.
Fixes: #3667
|
|
This resolves any paths specified on --directory=, --template=, and --image=
before using them. This makes sure nspawn can be used correctly on symlinked
images and directory trees.
Fixes: #2001
|
|
Let's use chase_symlinks() everywhere, and stop using GNU
canonicalize_file_name() everywhere. For most cases this should not change
behaviour, however increase exposure of our function to get better tested. Most
importantly in a few cases (most notably nspawn) it can take the correct root
directory into account when chasing symlinks.
|