Age | Commit message (Collapse) | Author |
|
core: add cgroup CPU controller support on the unified hierarchy
(zj: merging not squashing to make it clear against which upstream this patch was developed.)
|
|
|
|
Unfortunately, due to the disagreements in the kernel development community,
CPU controller cgroup v2 support has not been merged and enabling it requires
applying two small out-of-tree kernel patches. The situation is explained in
the following documentation.
https://git.kernel.org/cgit/linux/kernel/git/tj/cgroup.git/tree/Documentation/cgroup-v2-cpu.txt?h=cgroup-v2-cpu
While it isn't clear what will happen with CPU controller cgroup v2 support,
there are critical features which are possible only on cgroup v2 such as
buffered write control making cgroup v2 essential for a lot of workloads. This
commit implements systemd CPU controller support on the unified hierarchy so
that users who choose to deploy CPU controller cgroup v2 support can easily
take advantage of it.
On the unified hierarchy, "cpu.weight" knob replaces "cpu.shares" and "cpu.max"
replaces "cpu.cfs_period_us" and "cpu.cfs_quota_us". [Startup]CPUWeight config
options are added with the usual compat translation. CPU quota settings remain
unchanged and apply to both legacy and unified hierarchies.
v2: - Error in man page corrected.
- CPU config application in cgroup_context_apply() refactored.
- CPU accounting now works on unified hierarchy.
|
|
Fix man links
|
|
|
|
|
|
sd_journal-query_enumerate was an early draft, the name was changed
to sd_j_enumerate_fields.
|
|
|
|
Serve journals in the specified directory instead of default journals.
|
|
beef up /var/tmp and /tmp handling; set $SERVICE_RESULT/$EXIT_CODE/$EXIT_STATUS on ExecStop= and make sure root/nobody are always resolvable
|
|
fixes #3881
|
|
Let's extend nss-systemd to also synthesize user/group entries for the
UIDs/GIDs 0 and 65534 which have special kernel meaning. Given that nss-systemd
is listed in /etc/nsswitch.conf only very late any explicit listing in
/etc/passwd or /etc/group takes precedence.
This functionality is useful in minimal container-like setups that lack
/etc/passwd files (or only have incompletely populated ones).
|
|
ExecStop=/ExecStopPost= commands
This should simplify monitoring tools for services, by passing the most basic
information about service result/exit information via environment variables,
thus making it unnecessary to retrieve them explicitly via the bus.
|
|
Update help for "short-full" and shorten to 80 columns
|
|
https://lists.freedesktop.org/archives/systemd-devel/2016-August/037268.html
|
|
Meaning of --all was mentioned in list-dependencies description, but the this
effect should also be mentioned in the description of the option itself.
|
|
nspawn resolv.conf handling improvements, and inherit $TERM all the way through nspawn → console login
|
|
This new output mode formats all timestamps using the usual format_timestamp()
call we use pretty much everywhere else. Timestamps formatted this way are some
ways more useful than traditional syslog timestamps as they include weekday,
month and timezone information, while not being much longer. They are also not
locale-dependent. The primary advantage however is that they may be passed
directly to journalctl's --since= and --until= switches as soon as #3869 is
merged.
While we are at it, let's also add "short-unix" to shell completion.
|
|
This patch improves parsing and generation of timestamps and calendar
specifications in two ways:
- The week day is now always printed in the abbreviated English form, instead
of the locale's setting. This makes sure we can always parse the week day
again, even if the locale is changed. Given that we don't follow locale
settings for printing timestamps in any other way either (for example, we
always use 24h syntax in order to make uniform parsing possible), it only
makes sense to also stick to a generic, non-localized form for the timestamp,
too.
- When parsing a timestamp, the local timezone (in its DST or non-DST name)
may be specified, in addition to "UTC". Other timezones are still not
supported however (not because we wouldn't want to, but mostly because libc
offers no nice API for that). In itself this brings no new features, however
it ensures that any locally formatted timestamp's timezone is also parsable
again.
These two changes ensure that the output of format_timestamp() may always be
passed to parse_timestamp() and results in the original input. The related
flavours for usec/UTC also work accordingly. Calendar specifications are
extended in a similar way.
The man page is updated accordingly, in particular this removes the claim that
timestamps systemd prints wouldn't be parsable by systemd. They are now.
The man page previously showed invalid timestamps as examples. This has been
removed, as the man page shouldn't be a unit test, where such negative examples
would be useful. The man page also no longer mentions the names of internal
functions, such as format_timestamp_us() or UNIX error codes such as EINVAL.
|
|
This setting adds minimal user namespacing support to a service. When set the invoked
processes will run in their own user namespace. Only a trivial mapping will be
set up: the root user/group is mapped to root, and the user/group of the
service will be mapped to itself, everything else is mapped to nobody.
If this setting is used the service runs with no capabilities on the host, but
configurable capabilities within the service.
This setting is particularly useful in conjunction with RootDirectory= as the
need to synchronize /etc/passwd and /etc/group between the host and the service
OS tree is reduced, as only three UID/GIDs need to match: root, nobody and the
user of the service itself. But even outside the RootDirectory= case this
setting is useful to substantially reduce the attack surface of a service.
Example command to test this:
systemd-run -p PrivateUsers=1 -p User=foobar -t /bin/sh
This runs a shell as user "foobar". When typing "ps" only processes owned by
"root", by "foobar", and by "nobody" should be visible.
|
|
This removes the --share-system switch: from the documentation, the --help text
as well as the command line parsing. It's an ugly option, given that it kinda
contradicts the whole concept of PID namespaces that nspawn implements. Since
it's barely ever used, let's just deprecate it and remove it from the options.
It might be useful as a debugging option, hence the functionality is kept
around for now, exposed via an undocumented $SYSTEMD_NSPAWN_SHARE_SYSTEM
environment variable.
|
|
Update documentation for systemd-vconsole-setup
|
|
Introduce MaxConnectionsPerSource= that is number of concurrent
connections allowed per IP.
RFE: 1939
|
|
This complements graphical-session.target for services which set up the
environment (e. g. dbus-update-activation-environment) and need to run before
the actual graphical session.
|
|
The CPUID and DMI vendor strings do not seem to be documented.
Values were found experimentally and by inspecting the source code.
|
|
They were outdated, and this way it's less likely that they'll get out of sync
again. Anyway, it's easier for the reader to have the kernel and config file
options next to one another.
|
|
|
|
Explain in the systemd.resource-control man that systemd user instance can't use resource control on
cgroup-v1.
|
|
add cgroup-v2.txt link in section "Unified and Legacy Control Group
Hierarchies" of systemd.resource-control man.
|
|
vconsole-setup: updates & fixes V2
|
|
- about namespace
- about udev rules
|
|
In this mode, messages from processes which are not part of the session
land in the main journal file, and only output of processes which are
properly part of the session land in the user's journal. This is
confusing, in particular because systemd-coredump runs outside of the
login session.
"Deprecate" SplitMode=login by removing it from documentation, to
discourage people from using it.
|
|
|
|
|
|
This unit acts as a dynamic "alias" target for any concrete graphical user
session like gnome-session.target; these should declare
"BindsTo=graphical-session.target" so that both targets stop and start at the
same time.
This allows services that run in a particular graphical user session (e. g.
gnome-settings-daemon.service) to declare "PartOf=graphical-session.target"
without having to know or get updated for all/new session types. This will
ensure that stopping the graphical session will stop all services which are
associated to it.
|
|
Id128 fixes and more
|
|
Adressing https://github.com/systemd/systemd/issues/3755#issuecomment-234214273
|
|
Addressing:
https://github.com/systemd/systemd/commit/b541146bf8c34aaaa9efcf58325f18da9253c4ec#commitcomment-17997074
|
|
News and man tweaks
|
|
As suggested by @mbiebl we already use the "!" special char in unit file
assignments for negation, hence we should not use it in a different context for
privileged execution. Let's use "+" instead.
|
|
To "search something", in the meaning of looking for it, is valid,
but "search _for_ something" is much more commonly used, especially when
the meaning could be confused with "looking _through_ something"
(for some other object).
(C.f. "the police search a person", "the police search for a person".)
Also reword the rest of the paragraph to avoid using "automatically"
three times.
|
|
Not as many people use chroot as before, so make the flow a bit nicer by
talking less about chroot.
"change to the either" is awkward and unclear. Just remove that part,
because all changes are lost, period.
|
|
"systemctl enable"
Clarify that "systemctl enable" can operate either on unit names or on unit
file paths (also, adjust the --help text to clarify this). Say that "systemctl
enable" on unit file paths also links the unit into the search path.
Many other fixes.
This should improve the documentation to avoid further confusion around #3706.
|
|
|
|
uuid/id128 code rework
|
|
Let's not mention the supposed security benefit of turning off caching. It is
really questionnable, and I#d rather not create the impression that we actually
believed turning off caching would be a good idea.
Instead, mention that Cache=no is implicit if a DNS server on the local host is
used.
|
|
With this NSS module all dynamic service users will be resolvable via NSS like
any real user.
|
|
service is running
This adds a new boolean setting DynamicUser= to service files. If set, a new
user will be allocated dynamically when the unit is started, and released when
it is stopped. The user ID is allocated from the range 61184..65519. The user
will not be added to /etc/passwd (but an NSS module to be added later should
make it show up in getent passwd).
For now, care should be taken that the service writes no files to disk, since
this might result in files owned by UIDs that might get assigned dynamically to
a different service later on. Later patches will tighten sandboxing in order to
ensure that this cannot happen, except for a few selected directories.
A simple way to test this is:
systemd-run -p DynamicUser=1 /bin/sleep 99999
|
|
As it turns out 512 is max number of tasks per service is hit by too many
applications, hence let's bump it a bit, and make it relative to the system's
maximum number of PIDs. With this change the new default is 15%. At the
kernel's default pids_max value of 32768 this translates to 4915. At machined's
default TasksMax= setting of 16384 this translates to 2457.
Why 15%? Because it sounds like a round number and is close enough to 4096
which I was going for, i.e. an eight-fold increase over the old 512
Summary:
| on the host | in a container
old default | 512 | 512
new default | 4915 | 2457
|
|
Let's change from a fixed value of 12288 tasks per user to a relative value of
33%, which with the kernel's default of 32768 translates to 10813. This is a
slight decrease of the limit, for no other reason than "33%" sounding like a nice
round number that is close enough to 12288 (which would translate to 37.5%).
(Well, it also has the nice effect of still leaving a bit of room in the PID
space if there are 3 cooperating evil users that try to consume all PIDs...
Also, I like my bikesheds blue).
Since the new value is taken relative, and machined's TasksMax= setting
defaults to 16384, 33% inside of containers is usually equivalent to 5406,
which should still be ample space.
To summarize:
| on the host | in the container
old default | 12288 | 12288
new default | 10813 | 5406
|