Age | Commit message (Collapse) | Author |
|
This adds a new switch --as-pid2, which allows running commands as PID 2, while a stub init process is run as PID 1.
This is useful in order to run arbitrary commands in a container, as PID1's semantics are different from all other
processes regarding reaping of unknown children or signal handling.
|
|
Fixes: #2192
|
|
Fixes #2186
This fixes fall-out from 574edc90066c3faeadcf4666928ed9b0ac409c75.
|
|
Fixes #2091
|
|
|
|
Change the capability bounding set parser and logic so that the bounding
set is kept as a positive set internally. This means that the set
reflects those capabilities that we want to keep instead of drop.
|
|
GLIB has recently started to officially support the gcc cleanup
attribute in its public API, hence let's do the same for our APIs.
With this patch we'll define an xyz_unrefp() call for each public
xyz_unref() call, to make it easy to use inside a
__attribute__((cleanup())) expression. Then, all code is ported over to
make use of this.
The new calls are also documented in the man pages, with examples how to
use them (well, I only added docs where the _unref() call itself already
had docs, and the examples, only cover sd_bus_unrefp() and
sd_event_unrefp()).
This also renames sd_lldp_free() to sd_lldp_unref(), since that's how we
tend to call our destructors these days.
Note that this defines no public macro that wraps gcc's attribute and
makes it easier to use. While I think it's our duty in the library to
make our stuff easy to use, I figure it's not our duty to make gcc's own
features easy to use on its own. Most likely, client code which wants to
make use of this should define its own:
#define _cleanup_(function) __attribute__((cleanup(function)))
Or similar, to make the gcc feature easier to use.
Making this logic public has the benefit that we can remove three header
files whose only purpose was to define these functions internally.
See #2008.
|
|
|
|
The new switch operates like --network-veth, but may be specified
multiple times (to define multiple link pairs) and allows flexible
definition of the interface names.
This is an independent reimplementation of #1678, but defines different
semantics, keeping the behaviour completely independent of
--network-veth. It also comes will full hook-up for .nspawn files, and
the matching documentation.
|
|
[v2] treewide: treatment of errno and other cleanups
|
|
with small manual cleanups for style.
|
|
doc: typo and ortho fixes
|
|
We were hardcoding "systemd-nspawn" as the value of the $container env
variable and "nspawn" as the service string in machined registration.
This commit allows the user to configure it by setting the
$SYSTEMD_NSPAWN_CONTAINER_SERVICE env variable when calling
systemd-nspawn.
If $SYSTEMD_NSPAWN_CONTAINER_SERVICE is not set, we use the string
"systemd-nspawn" for both, fixing the previous inconsistency.
|
|
|
|
|
|
|
|
The S_ISREG test does not set errno, so don't use it in the error
message.
|
|
Use the "return log_error_errno(...)" idiom to have fewer curly braces.
The last hunk also fixes the return value of setup_journal(), but the
fix has no practical effect.
|
|
Our functions return negative error codes.
Do not rely on errno being set after calling our own functions.
|
|
|
|
|
|
|
|
|
|
capability-util.[ch]
The files are named too generically, so that they might conflict with
the upstream project headers. Hence, let's add a "-util" suffix, to
clarify that this are just our utility headers and not any official
upstream headers.
|
|
|
|
|
|
|
|
|
|
|
|
socket-util.[ch]
|
|
|
|
There are more than enough to deserve their own .c file, hence move them
over.
|
|
string-util.[ch]
There are more than enough calls doing string manipulations to deserve
its own files, hence do something about it.
This patch also sorts the #include blocks of all files that needed to be
updated, according to the sorting suggestions from CODING_STYLE. Since
pretty much every file needs our string manipulation functions this
effectively means that most files have sorted #include blocks now.
Also touches a few unrelated include files.
|
|
Let's introduce a common function that makes relative paths absolute and
warns about any errors while doing so.
|
|
get_current_dir_name() can return a variety of errors, not just ENOMEM,
hence don't blindly turn its errors to ENOMEM, but return correct errors
in path_make_absolute_cwd().
This trickles down into a couple of other functions, some of which
receive unrelated minor fixes too with this commit.
|
|
Othewise we might follow the symlinks on the host, instead of the
container.
Fixes #1400
|
|
Make sure we acquire CAP_NET_ADMIN if we require virtual networking.
Make sure we imply virtual ethernet correctly when bridge is request.
Fixes: #1511
Fixes: #1554
Fixes: #1590
|
|
With this change we understand more than just leaf quota groups for
btrfs file systems. Specifically:
- When we create a subvolume we can now optionally add the new subvolume
to all qgroups its parent subvolume was member of too. Alternatively
it is also possible to insert an intermediary quota group between the
parent's qgroups and the subvolume's leaf qgroup, which is useful for
a concept of "subtree" qgroups, that contain a subvolume and all its
children.
- The remove logic for subvolumes has been updated to optionally remove
any leaf qgroups or "subtree" qgroups, following the logic above.
- The snapshot logic for subvolumes has been updated to replicate the
original qgroup setup of the source, if it follows the "subtree"
design described above. It will not cover qgroup setups that introduce
arbitrary qgroups, especially those orthogonal to the subvolume
hierarchy.
This also tries to be more graceful when setting up /var/lib/machines as
btrfs. For example, if mkfs.btrfs is missing we don't even try to set it
up as loopback device.
Fixes #1559
Fixes #1129
|
|
Since v3.11/7dc5dbc ("sysfs: Restrict mounting sysfs"), the kernel
doesn't allow mounting sysfs if you don't have CAP_SYS_ADMIN rights over
the network namespace.
So the mounting /sys as a tmpfs code introduced in
d8fc6a000fe21b0c1ba27fbfed8b42d00b349a4b doesn't work with user
namespaces if we don't use private-net. The reason is that we mount
sysfs inside the container and we're in the network namespace of the host
but we don't have CAP_SYS_ADMIN over that namespace.
To fix that, we mount /sys as a sysfs (instead of tmpfs) if we don't use
private network and ignore the /sys-as-a-tmpfs code if we find that /sys
is already mounted as sysfs.
Fixes #1555
|
|
Previously, we'd allocate the TTY, spawn a service on it, but
immediately start processing the TTY and forwarding it to whatever the
commnd was started on. This is however problematic, as the TTY might get
actually opened only much later by the service. We'll hence first get
EIOs on the master as the other side is still closed, and hence
considered it hung up and terminated the session.
With this change we add a flag to the pty forwarding logic:
PTY_FORWARD_IGNORE_INITIAL_VHANGUP. If set, we'll ignore all hangups
(i.e. EIOs) on the master PTY until the first byte is successfully read.
From that point on we consider a hangup/EIO a regular connection termination. This
way, we handle the race: when we get EIO initially we'll ignore it,
until the connection is properly set up, at which time we start
honouring it.
|
|
sysfs below it
This way we can hide things like /sys/firmware or /sys/hypervisor from
the container, while keeping the device tree around.
While this is a security benefit in itself it also allows us to fix
issue #1277.
Previously we'd mount /sys before creating the user namespace, in order
to be able to mount /sys/fs/cgroup/* beneath it (which resides in it),
which we can only mount outside of the user namespace. To ensure that
the user namespace owns the network namespace we'd set up the network
namespace at the same time as the user namespace. Thus, we'd still see
the /sys/class/net/ from the originating network namespace, even though
we are in our own network namespace now. With this patch, /sys is
mounted before transitioning into the user namespace as tmpfs, so that
we can also mount /sys/fs/cgroup/* into it this early. The directories
such as /sys/class/ are then later added in from the real sysfs from
inside the network and user namespace so that they actually show whatis
available in it.
Fixes #1277
|
|
We didn#t actually pass ownership of /run to the UID in the container
since some releases, let's fix that.
|
|
|
|
This also allows us to drop build.h from a ton of files, hence do so.
Since we touched the #includes of those files, let's order them properly
according to CODING_STYLE.
|
|
This is highly complex code after all, we really should make sure to
only keep one implementation of this extremely difficult function
around.
|
|
|
|
Also, make it slightly more powerful, by accepting a flags argument, and
make it safe for handling if more than one cmsg attribute happens to be
attached.
|
|
A bunch of "Client -> Child" fixes and one barrier-enumerator fix.
(David: rebased on master)
|
|
(David: Note, this is just a cleanup and doesn't fix any bugs)
|
|
Introduce two new helpers that send/receive a single fd via a unix
transport. Also make nspawn use them instead of hard-coding it.
Based on a patch by Krzesimir Nowak.
|