Age | Commit message (Collapse) | Author |
|
Lets go further and make /lib/modules/ inaccessible for services that do
not have business with modules, this is a minor improvment but it may
help on setups with custom modules and they are limited... in regard of
kernel auto-load feature.
This change introduce NameSpaceInfo struct which we may embed later
inside ExecContext but for now lets just reduce the argument number to
setup_namespace() and merge ProtectKernelModules feature.
|
|
|
|
The rawio system calls were filtered, but CAP_SYS_RAWIO allows to access raw
data through /proc, ioctl and some other exotic system calls...
|
|
This is useful to turn off explicit module load and unload operations on modular
kernels. This option removes CAP_SYS_MODULE from the capability bounding set for
the unit, and installs a system call filter to block module system calls.
This option will not prevent the kernel from loading modules using the module
auto-load feature which is a system wide operation.
|
|
Various smaller documentation fixes.
|
|
Add an "invocation ID" concept to the service manager
|
|
Let's clarify that for user services some OS-defined limits bound the settings
in the unit files.
Fixes: #4232
|
|
Fixes: #4116
|
|
also have a unit state of that name
Fixes: #3971
|
|
variables
Document the default pagers used, as well as $SYSTEMD_LESSCHARSET.
Fixes: #4143
|
|
It seems that this count was not updated when snapshot units were
removed in #1841.
|
|
The --no-seal and --no-compress options were dropped and replaced with
boolean functionality. This syncs the man page with the code.
|
|
|
|
This is like the previous reverted commit, but any boolean is still accepted,
not just "yes" and "no". Man page is adjusted to match the code.
|
|
Now that systemd-nspawn@.service includes -U, more users might be interested
in this tidbit ;)
|
|
This patch enables to configure
IFA_F_HOMEADDRESS
IFA_F_NODAD
IFA_F_MANAGETEMPADDR
IFA_F_NOPREFIXROUTE
IFA_F_MCAUTOJOIN
|
|
resolved: add an option to disable the stub resolver
|
|
|
|
Let's add documentation about SD_ID128_NULL and sd_id128_is_null().
Let's also indent our examples by 8chs, as is generally our coding style.
|
|
This adds a new invocation ID concept to the service manager. The invocation ID
identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is
generated each time a unit moves from and inactive to an activating or active
state.
The primary usecase for this concept is to connect the runtime data PID 1
maintains about a service with the offline data the journal stores about it.
Previously we'd use the unit name plus start/stop times, which however is
highly racy since the journal will generally process log data after the service
already ended.
The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel,
except that it applies to an individual unit instead of the whole system.
The invocation ID is passed to the activated processes as environment variable.
It is additionally stored as extended attribute on the cgroup of the unit. The
latter is used by journald to automatically retrieve it for each log logged
message and attach it to the log entry. The environment variable is very easily
accessible, even for unprivileged services. OTOH the extended attribute is only
accessible to privileged processes (this is because cgroupfs only supports the
"trusted." xattr namespace, not "user."). The environment variable may be
altered by services, the extended attribute may not be, hence is the better
choice for the journal.
Note that reading the invocation ID off the extended attribute from journald is
racy, similar to the way reading the unit name for a logging process is.
This patch adds APIs to read the invocation ID to sd-id128:
sd_id128_get_invocation() may be used in a similar fashion to
sd_id128_get_boot().
PID1's own logging is updated to always include the invocation ID when it logs
information about a unit.
A new bus call GetUnitByInvocationID() is added that allows retrieving a bus
path to a unit by its invocation ID. The bus path is built using the invocation
ID, thus providing a path for referring to a unit that is valid only for the
current runtime cycleof it.
Outlook for the future: should the kernel eventually allow passing of cgroup
information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we
can alter the invocation ID to be generated as hash from that rather than
entirely randomly. This way we can derive the invocation race-freely from the
messages.
|
|
This patch adds support to remote checksum checksum offload to VXLAN.
This patch adds RemoteCheckSumTx and RemoteCheckSumRx vxlan configuration
to enable remote checksum offload for transmit and receive on the VXLAN tunnel.
|
|
For some certification, it should not be possible to reboot the machine through ctrl-alt-delete. Currently we suggest our customers to mask the ctrl-alt-delete target, but that is obviously not enough.
Patching the keymaps to disable that is really not a way to go for them, because the settings need to be easily checked by some SCAP tools.
|
|
Add seccomp support for the s390 architecture (31-bit and 64-bit)
to systemd.
This requires libseccomp >= 2.3.1.
|
|
Routing-domains-manpage tweak and NEWS update
|
|
This PR removes consecutive duplicate words from the man pages of:
* `resolved.conf.xml`
* `systemd.exec.xml`
* `systemd.socket.xml`
|
|
Put more emphasis on the routing part. This is the more interesting
thing, and also more complicated and novel.
Explain "search domains" as the special case. Also explain the effect of
~. in more detail.
|
|
<entry>-ies must be a single line of text. Otherwise docbook does strange
things to the indentation.
|
|
SYSTEMD_UNIT_PATH=foobar: systemd-analyze verify barbar/unit.service
will load units from barbar/, foobar/, /etc/systemd/system/, etc.
SYSTEMD_UNIT_PATH= systemd-analyze verify barbar/unit.service
will load units only from barbar/, which is useful e.g. when testing
systemd's own units on a system with an older version of systemd installed.
|
|
Fixes #3830
|
|
|
|
It needs to be possible to tell apart "the nss-resolve module does not exist"
(which can happen when running foreign-architecture programs) from "the queried
DNS name failed DNSSEC validation" or other errors. So return NOTFOUND for these
cases too, and only keep UNAVAIL for the cases where we cannot handle the given
address family.
This makes it possible to configure a fallback to "dns" without breaking
DNSSEC, with "resolve [!UNAVAIL=return] dns". Add this to the manpage.
This does not change behaviour if resolved is not running, as that already
falls back to the "dns" glibc module.
Fixes #4157
|
|
resolve includes myhostname functionality, so there is no need to add it again.
|
|
coredump: remove Storage=both support, various fixes for sd-coredump and coredumpctl
|
|
DNS servers which have route-only domains should only be used for
the specified domains. Routing queries about other domains there is a privacy
violation, prone to fail (as that DNS server was not meant to be used for other
domains), and puts unnecessary load onto that server.
Introduce a new helper function dns_server_limited_domains() that checks if the
DNS server should only be used for some selected domains, i. e. has some
route-only domains without "~.". Use that when determining whether to query it
in the scope, and when writing resolv.conf.
Extend the test_route_only_dns() case to ensure that the DNS server limited to
~company does not appear in resolv.conf. Add test_route_only_dns_all_domains()
to ensure that a server that also has ~. does appear in resolv.conf as global
name server. These reproduce #3420.
Add a new test_resolved_domain_restricted_dns() test case that verifies that
domain-limited DNS servers are only being used for those domains. This
reproduces #3421.
Clarify what a "routing domain" is in the manpage.
Fixes #3420
Fixes #3421
|
|
Back when external storage was initially added in 34c10968cb, this mode of
storage was added. This could have made some sense back when XZ compression was
used, and an uncompressed core on disk could be used as short-lived cache file
which does require costly decompression. But now fast LZ4 compression is used
(by default) both internally and externally, so we have duplicated storage,
using the same compression and same default maximum core size in both cases,
but with different expiration lifetimes. Even the uncompressed-external,
compressed-internal mode is not very useful: for small files, decompression
with LZ4 is fast enough not to matter, and for large files, decompression is
still relatively fast, but the disk-usage penalty is very big.
An additional problem with the two modes of storage is that it complicates
the code and makes it much harder to return a useful error message to the user
if we cannot find the core file, since if we cannot find the file we have to
check the internal storage first.
This patch drops "both" storage mode. Effectively this means that if somebody
configured coredump this way, they will get a warning about an unsupported
value for Storage, and the default of "external" will be used.
I'm pretty sure that this mode is very rarely used anyway.
|
|
|
|
core:sandbox: Add new ProtectKernelTunables=, ProtectControlGroups=, ProtectSystem=strict and fixes
|
|
There was no certainty about how the path in service file should look
like for usb functionfs activation. Because of this it was treated
differently in different places, which made this feature unusable.
This patch fixes the path to be the *mount directory* of functionfs, not
ep0 file path and clarifies in the documentation that ListenUSBFunction should be
the location of functionfs mount point, not ep0 file itself.
|
|
It seems to me that the explicit positional argument should have higher
priority than "an option".
|
|
|
|
is set
Instead of having a local syscall list, use the @raw-io group which
contains the same set of syscalls to filter.
|
|
Make ALSA entries, latency interface, mtrr, apm/acpi, suspend interface,
filesystems configuration and IRQ tuning readonly.
Most of these interfaces now days should be in /sys but they are still
available through /proc, so just protect them. This patch does not touch
/proc/net/...
|
|
PrivateDevices=true
|
|
Documentation fixes for ReadWritePaths= and ProtectKernelTunables=
as reported by Evgeny Vereshchagin.
|
|
Let's merge a couple of columns, to make the table a bit shorter. This
effectively just drops whitespace, not contents, but makes the currently
humungous table much much more compact.
|
|
|
|
This reworks the documentation for ReadOnlyPaths=, ReadWritePaths=,
InaccessiblePaths=. It no longer claims that we'd follow symlinks relative to
the host file system. (Which wasn't true actually, as we didn't follow symlinks
at all in the most recent releases, and we know do follow them, but relative to
RootDirectory=).
This also replaces all references to the fact that all fs namespacing options
can be undone with enough privileges and disable propagation by a single one in
the documentation of ReadOnlyPaths= and friends, and then directs the read to
this in all other places.
Moreover a hint is added to the documentation of SystemCallFilter=, suggesting
usage of ~@mount in case any of the fs namespacing related options are used.
|
|
Let's drop the reference to the cap_from_name() function in the documentation
for the capabilities setting, as it is hardly helpful. Our readers are not
necessarily C hackers knowing the semantics of cap_from_name(). Moreover, the
strings we accept are just the plain capability names as listed in
capabilities(7) hence there's really no point in confusing the user with
anything else.
|
|
Let's make sure that services that use DynamicUser=1 cannot leave files in the
file system should the system accidentally have a world-writable directory
somewhere.
This effectively ensures that directories need to be whitelisted rather than
blacklisted for access when DynamicUser=1 is set.
|
|
Let's tighten our sandbox a bit more: with this change ProtectSystem= gains a
new setting "strict". If set, the entire directory tree of the system is
mounted read-only, but the API file systems /proc, /dev, /sys are excluded
(they may be managed with PrivateDevices= and ProtectKernelTunables=). Also,
/home and /root are excluded as those are left for ProtectHome= to manage.
In this mode, all "real" file systems (i.e. non-API file systems) are mounted
read-only, and specific directories may only be excluded via
ReadWriteDirectories=, thus implementing an effective whitelist instead of
blacklist of writable directories.
While we are at, also add /efi to the list of paths always affected by
ProtectSystem=. This is a follow-up for
b52a109ad38cd37b660ccd5394ff5c171a5e5355 which added /efi as alternative for
/boot. Our namespacing logic should respect that too.
|