Age | Commit message (Collapse) | Author |
|
This patch set adds full support the new unified cgroup hierarchy logic
of modern kernels.
A new kernel command line option "systemd.unified_cgroup_hierarchy=1" is
added. If specified the unified hierarchy is mounted to /sys/fs/cgroup
instead of a tmpfs. No further hierarchies are mounted. The kernel
command line option defaults to off. We can turn it on by default as
soon as the kernel's APIs regarding this are stabilized (but even then
downstream distros might want to turn this off, as this will break any
tools that access cgroupfs directly).
It is possibly to choose for each boot individually whether the unified
or the legacy hierarchy is used. nspawn will by default provide the
legacy hierarchy to containers if the host is using it, and the unified
otherwise. However it is possible to run containers with the unified
hierarchy on a legacy host and vice versa, by setting the
$UNIFIED_CGROUP_HIERARCHY environment variable for nspawn to 1 or 0,
respectively.
The unified hierarchy provides reliable cgroup empty notifications for
the first time, via inotify. To make use of this we maintain one
manager-wide inotify fd, and each cgroup to it.
This patch also removes cg_delete() which is unused now.
On kernel 4.2 only the "memory" controller is compatible with the
unified hierarchy, hence that's the only controller systemd exposes when
booted in unified heirarchy mode.
This introduces a new enum for enumerating supported controllers, plus a
related enum for the mask bits mapping to it. The core is changed to
make use of this everywhere.
This moves PID 1 into a new "init.scope" implicit scope unit in the root
slice. This is necessary since on the unified hierarchy cgroups may
either contain subgroups or processes but not both. PID 1 hence has to
move out of the root cgroup (strictly speaking the root cgroup is the
only one where processes and subgroups are still allowed, but in order
to support containers nicey, we move PID 1 into the new scope in all
cases.) This new unit is also used on legacy hierarchy setups. It's
actually pretty useful on all systems, as it can then be used to filter
journal messages coming from PID 1, and so on.
The root slice ("-.slice") is now implicitly created and started (and
does not require a unit file on disk anymore), since
that's where "init.scope" is located and the slice needs to be started
before the scope can.
To check whether we are in unified or legacy hierarchy mode we use
statfs() on /sys/fs/cgroup. If the .f_type field reports tmpfs we are in
legacy mode, if it reports cgroupfs we are in unified mode.
This patch set carefuly makes sure that cgls and cgtop continue to work
as desired.
When invoking nspawn as a service it will implicitly create two
subcgroups in the cgroup it is using, one to move the nspawn process
into, the other to move the actual container processes into. This is
done because of the requirement that cgroups may either contain
processes or other subgroups.
|
|
|
|
Never report errors twice.
|
|
|
|
When showing the number of tasks in a cgroup, recursively count tasks in
child cgroups and include them in the number. This ensures that the
number of tasks is cummulative the same way as memory, cpu and IO
resources are.
Old behaviour can be restored by passing the new --recursive=no switch.
|
|
However, allow them to be counted in by specifying -k
|
|
This way the output is restricted to cgroups from a container when run
in one.
|
|
In preparation of the unified cgroup support, let's clean up cgtop:
a) rework time code to be based on "nsec_t" rather than "struct timespec"
b) Introduce long option --order= for selecting ordering
c) count number of processes only in the main hierarchy, don't bother
with the controller hierarchies. We don't allow orthogonal
hierarchies in systemd anymore, hence there's no point to check the
other hierarchies.
d) Deal with non-monotonic cpuacct values (see #749)
e) When sorting groups, don't do prefix compare when ordering by number
of tasks, since this is not accumulative for all children.
f) Actually make --cpu without parameter work
g) Don't output control characters when we get them as input.
Fixes #749.
|
|
|
|
since last tick
Emit "0" rather than "-" if no change in IO values are seen for a process since
last tick, so long as accounting has registered content at all.
|
|
- Explicitly flush stdout before sleep between iterations
- Only clear user keystrokes when output is to TTY
- Add a newline between output batches when output is not to TTY
|
|
|
|
|
|
|
|
systemd-cgtop --dept=1 -b -n 10 -d 0.1 | cat
Assertion 'new_length >= 3' failed at src/shared/util.c:3 \
595, function ellipsize_mem(). Aborting.
Aborted (core dumped)
(David: add comment)
|
|
It corrrectly handles both positive and negative errno values.
|
|
As a followup to 086891e5c1 "log: add an "error" parameter to all
low-level logging calls and intrdouce log_error_errno() as log calls
that take error numbers", use sed to convert the simple cases to use
the new macros:
find . -name '*.[ch]' | xargs sed -r -i -e \
's/log_(debug|info|notice|warning|error|emergency)\("(.*)%s"(.*), strerror\(-([a-zA-Z_]+)\)\);/log_\1_errno(-\4, "\2%m"\3);/'
Multi-line log_*() invocations are not covered.
And we also should add log_unit_*_errno().
|
|
That hashmap_move_one() currently cannot fail with -ENOMEM is an
implementation detail, which is not possible to guarantee in general.
Hashmap implementations based on anything else than chaining of
individual entries may have to allocate.
hashmap_move_one will not fail with -ENOMEM if a proper reservation has
been made beforehand. Use reservations in install.c.
In cgtop.c simply propagate the error instead of asserting.
|
|
It is redundant to store 'hash' and 'compare' function pointers in
struct Hashmap separately. The functions always comprise a pair.
Store a single pointer to struct hash_ops instead.
systemd keeps hundreds of hashmaps, so this saves a little bit of
memory.
|
|
getopt is usually good at printing out a nice error message when
commandline options are invalid. It distinguishes between an unknown
option and a known option with a missing arg. It is better to let it
do its job and not use opterr=0 unless we actually want to suppress
messages. So remove opterr=0 in the few places where it wasn't really
useful.
When an error in options is encountered, we should not print a lengthy
help() and overwhelm the user, when we know precisely what is wrong
with the commandline. In addition, since help() prints to stdout, it
should not be used except when requested with -h or --help.
Also, simplify things here and there.
|
|
If -flto is used then gcc will generate a lot more warnings than before,
among them a number of use-without-initialization warnings. Most of them
without are false positives, but let's make them go away, because it
doesn't really matter.
|
|
|
|
Among other things this makes sure we always expose a --version command
and show it in the help texts.
|
|
This extends 62678ded 'efi: never call qsort on potentially
NULL arrays' to all other places where qsort is used and it
is not obvious that the count is non-zero.
|
|
The online help shows the keys as uppercase but the code and manpage say
lower case. Make the online help follow reality.
|
|
|
|
|
|
Instead of outputting "5h 55s 50ms 3us" we'll now output "5h
55.050003s". Also, while outputting the accuracy is configurable.
Basically we now try use "dot notation" for all time values > 1min. For
>= 1s we use 's' as unit, otherwise for >= 1ms we use 'ms' as unit, and
finally 'us'.
This should give reasonably values in most cases.
|
|
Internally we store all time values in usec_t, however parse_usec()
actually was used mostly to parse values in seconds (unless explicit
units were specified to define a different unit). Hence, be clear about
this and name the function about what we pass into it, not what we get
out of it.
|
|
|
|
|
|
use default depth from variable for --help
|
|
Also split out some fileio functions to fileio.c and provide a SELinux
aware pendant in fileio-label.c
see https://bugzilla.redhat.com/show_bug.cgi?id=881577
|
|
|
|
|
|
Return codes in systemd are negated and
if (r < 0) if (r == ENOENT)
was never true.
|
|
Adds messages for formally silent errors: new "Failed on cmdline argument %s: %s".
Removes some specific error messages for -ENOMEM in mount-setup.c. A few specific
ones have been left in other binaries.
|
|
|
|
|
|
[zj: use static]
|
|
|
|
also a number of minor fixups and bug fixes: spelling, oom errors
that didn't print errors, not properly forwarding error codes,
few more consistency issues, et cetera
|
|
glibc/glib both use "out of memory" consistantly so maybe we should
consider that instead of this.
Eliminates one string out of a number of binaries. Also fixes extra newline
in udev/scsi_id
|
|
This is to match strappend() and the other string related functions.
|
|
cgtop quits on startup if all the cgroup mounts it expects are not available.
Just continue without nonexistant ones.
|
|
https://bugs.freedesktop.org/show_bug.cgi?id=49778
|
|
|
|
|