Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
them
TODO: a better commit message
|
|
|
|
|
|
To demonstrate the breakage in the chown part: Be using an interactive
terminal, go to spawn a shell in a container; using --register=no, and using
userns. It will end up chown()ing the cgroup of your terminal session to the
container! And you will be left with that after you quit the container!
Similarly. the subcgroup bit will try create subcgroups for the parent and
child even they share the cgroup with other processes (as they likely to if
--register=no); and will find only partial success, leaving the cgroup with all
controllers disabled.
What we really care about is if the child process is alone in the cgroup, so
we'll take a peek at cgroup.procs for that cgroup to find out.
|
|
|
|
sync_cgroup() can sync name=systemd->unified or unified->name=systemd,
depending on the setup. However, the names of things, comments, and error
messages all assume (send the fals impression) that it only goes
name=systemd->unfied.
|
|
mount_legacy_cgns_supported() is very clearly meant to be a version of
mount_legacy_cgns_unsupported() modified to cope with the fact that it has
already chroot()ed, and thus can't look at the host /sys. So, the loops
and such look similar.
However, to cope with the fact that it can't look at /sys, it deals with
hierarchies in the outermost loop, rather than controllers. Yet, it kept
the list variable named "controllers". That's confusing.
|
|
cgroup_setup()
|
|
Yes, the relevant functions in cgroup-util actually do cache the values
with static variables. But passing it around as a value makes the flow
much nicer. The symmetry of having both the inner and outer cg versions
as a CGroupUnified enum makes the code much easier to grok; this could be
done with cg_version(), but I still think this is more readable.
|
|
|
|
|
|
|
|
|
|
Naming it arg_uid_shift is confusing because of the global arg_uid_shift in
nspawn.c
|
|
The `--help` text lies about what the `-U` flag does, and under-documents
the `--private-users` values. . Fix that.
|
|
One of the things that tmpfs_patch_options does is take an (optional) UID,
and insert "uid=${UID},gid=${UID}" into the options string. So we need a
uid_t argument, and a way of telling if we should use it. Fortunately,
that is built in to the uid_t value by having UID_INVALID as a possible
value.
So this is really a feature that requires one argument. Yet, it is somehow
taking 4! That is absurd. Simplify it to only take one argument, and have
that trickle all the way up to mount_all()'s usage.
Now, in may of the uses, the argument becomes
uid_shift == 0 ? UID_INVALID : uid_shift
because it used to treat uid_shift=0 as invalid unless the patch_ids flag
was also set. This keeps the behavior the same. Note that in all cases
where it is invoked, if !userns, then uid_shift is 0; we don't have to add
any checks for that.
That said, I'm pretty sure that "uid=0" and not setting "uid=" are the
same, but Christian Brauner seemed to not think so when implementing the
cgns support. https://github.com/systemd/systemd/pull/3589
|
|
The comment explains the obvious, but doesn't even mention the tricky part.
Of course we need do set things up before we remount read-only! That's
the general theme of the function!
What was totally non-obvious is why we only need to create it if
cg_ns_supported(), as the directory needs to exist no matter what. From
reading the code, I was convinced that it was broken on pre-cgns kernels
(pre-4.6, unless a distro backported it).
So explain that skippint creating if !cg_ns_supported() is an optimization.
|
|
Remove ", arbitrary named hierarchies" from the list of things that
cg_kernel_controllers() might return; /proc/cgroups does not contain
"name=" pseudo-controllers (at least in any version of the kernel that I am
aware of).
If there are kernels out there that do put "name=" pseudo-controllers in
/proc/cgroups, then the code that runs when SYSTEMD_NSPAWN_USE_CGNS=no is
broken on these kernels. So there's precedent to ignoring these kernels,
if they do exist.
|
|
It's silly that every time we check arg_use_cgns we also have to check
cg_ns_supported().
So, simplify these checks and force arg_use_cgns = false if the kernel
doesn't support cg_ns_supported.
|
|
|
|
First bug fixed by gcc 7. Yikes.
(cherry picked from commit 9ce6d1b319f8655100af6ecf5fd57e4558d57dd1)
|
|
gcc 7 adds -Wimplicit-fallthrough=3 to -Wextra. There are a few ways
we could deal with that. After we take into account the need to stay compatible
with older versions of the compiler (and other compilers), I don't think adding
__attribute__((fallthrough)), even as a macro, is worth the trouble. It sticks
out too much, a comment is just as good. But gcc has some very specific
requiremnts how the comment should look. Adjust it the specific form that it
likes. I don't think the extra stuff we had in those comments was adding much
value.
(Note: the documentation seems to be wrong, and seems to describe a different
pattern from the one that is actually used. I guess either the docs or the code
will have to change before gcc 7 is finalized.)
(cherry picked from commit ec251fe7d5bc24b5d38b0853bc5969f3a0ba06e2)
|
|
It also used __bitwise and __force. It seems easier to rename
our versions since they are local to this one single header.
Also, undefine them afteerwards, so that we don't pollute the
preprocessor macro namespace.
(cherry picked from commit dc66f33a16596c2886a24da12e56ec096214e124)
|
|
(cherry picked from commit 2e1f244efd2dfc1a60d032bef3d88b9ba6e0444b)
|
|
cgroup mode detection is broken in two different ways.
* detect_unified_cgroup_hierarchy() is called too nested in outer_child().
sync_cgroup() which is used by run() also needs to know the requested cgroup
mode but it's currently always getting CGROUP_UNIFIED_UNKNOWN. This makes it
skip syncing the inner cgroup hierarchy on some config combinations.
$ cat /proc/self/cgroup | grep systemd
1:name=systemd:/user.slice/user-0.slice/session-c1.scope
$ UNIFIED_CGROUP_HIERARCHY=0 SYSTEMD_NSPAWN_USE_CGNS=0 systemd-nspawn -M container
...
[root@container ~]# cat /proc/self/cgroup | grep systemd
1:name=systemd:/machine.slice/machine-container.x86_64.scope
$ exit
$ UNIFIED_CGROUP_HIERARCHY=1 SYSTEMD_NSPAWN_USE_CGNS=0 systemd-nspawn -M container
[root@container ~]# cat /proc/self/cgroup | grep 0::
0::/
$ exit
Note how the unified hierarchy case's path is not synchronized with the host.
This for example can cause issues when there are multiple such containers.
Fixed by moving detect_unified_cgroup_hierarchy() invocation to main().
* inner_child() was invoking cg_unified_flush(). inner_child() executes fully
scoped and can't determine which cgroup mode the host was in. It doesn't
make sense to keep flushing the detected mode when the host mode can't
change.
Fixed by replacing cg_unified_flush() invocations in outer_child() and
inner_child() with one in main().
(cherry picked from commit bd15ab41a1347fed8266845f875842d1502e02a6)
|
|
|
|
|
|
gperf-3.1 generates lookup functions that take a size_t length
parameter instead of unsigned int. Test for this at configure time.
Fixes: https://github.com/systemd/systemd/issues/5039
|
|
|
|
sed -i 's|Linux Boot Manager|Systemd Boot Manager|' src/boot/bootctl.c
|
|
|
|
|
|
As far as I can tell, no code in this repository actually uses the ID
field, so this is just a man page change.
|
|
|
|
|
|
This is not a blind replacement of "Linux" with "GNU/Linux". In some
cases, "Linux" is (correctly) used to refer to just the kernel. In others,
it is in a string for which code must also be adjusted; these instances
are not included in this commit.
|
|
|
|
This is a v232-applicable version of upstream c9fd987279a462e.
|
|
lz4 upstream decided to switch to an incompatible numbering scheme
(1.7.3 follows 131, to match the so version).
PKG_CHECK_MODULES does not allow two version matches for the same package,
so e.g. lz4 < 10 || lz4 >= 125 cannot be used. Check twice, once for
"new" numbers (anything below 10 is assume to be new), once for the "old"
numbers (anything above >= 125). This assumes that the "new" versioning
will not get to 10 to quickly. I think that's a safe assumption, lz4 is a
mature project.
Fixed #4690.
|
|
Make sure to populate the cache in cache_space_refresh() at least once
otherwise it's possible that the system boots fast enough (and the journal
flush service is finished) before the invalidate cache timeout (30 us) has
expired.
Fixes: #4790
|
|
Commit b006762 inverted the initial exit code which is relevant for --help and
--version without a particular reason. For these special options, parse_argv()
returns 0 so that our main() immediately skips to the end without adjusting
"ret". Otherwise, if an actual container is being started, ret is set on error
in run(), which still provides the "non-zero exit on error" behaviour.
Fixes #4605.
|
|
|
|
sed -i 's|Linux Boot Manager|Systemd Boot Manager|' src/boot/bootctl.c
|
|
|
|
|
|
As far as I can tell, no code in this repository actually uses the ID
field, so this is just a man page change.
|
|
|