Age | Commit message (Collapse) | Author |
|
Add new MountAPIVFS= boolean unit file setting + RootImage=
|
|
Add a bit of code that tries to get the right parameter order in place
for some of the better known architectures, and skips
restrict_namespaces for other archs.
This also bypasses the test on archs where we don't know the right
order.
In this case I didn't bother with testing the case where no filter is
applied, since that is hopefully just an issue for now, as there's
nothing stopping us from supporting more archs, we just need to know
which order is right.
Fixes: #5241
|
|
The compiler warning is a false positive, since n_addresses is always
initialised on the success path from parse_argv(), but the compiler
obviously can’t work that out.
Fixes:
src/test/test-nss.c:426:9: warning: 'n_addresses' may be used uninitialized in this function [-Wmaybe-uninitialized]
|
|
On i386 we block the old mmap() call entirely, since we cannot properly
filter it. Thankfully it hasn't been used by glibc since quite some
time.
Fixes: #5240
|
|
it enabled
If a unit foobar@.service stored below /usr is instantiated via a
symlink foobar@quux.service also below /usr, then we should consider the
instance statically enabled, while the template itself should continue
to be considered enabled/disabled/static depending on its [Install]
section.
In order to implement this we'll now look for enablement symlinks in all
unit search paths, not just in the config and runtime dirs.
Fixes: #5136
|
|
directory for a service
This is similar to RootDirectory= but mounts the root file system from a
block device or loopback file instead of another directory.
This reuses the image dissector code now used by nspawn and
gpt-auto-discovery.
|
|
it a NOP
See: #5215
|
|
explicit_bzero was added in glibc 2.25. Make use of it.
explicit_bzero is hardcoded to zero the memory, so string erase now
truncates the string, instead of overwriting it with 'x'. This causes
a visible difference only in the journalctl case.
|
|
Gcc7 is smarter about detecting unused functions and detects those two functions
which are unused in tests. But gperf generates them for us, so let's instead of removing
tell gcc that we know they might be unused in the test code.
In file included from ../src/test/test-af-list.c:29:0:
./src/basic/af-from-name.h:140:1: warning: ‘lookup_af’ defined but not used [-Wunused-function]
lookup_af (register const char *str, register size_t len)
^~~~~~~~~
In file included from ../src/test/test-arphrd-list.c:29:0:
./src/basic/arphrd-from-name.h:125:1: warning: ‘lookup_arphrd’ defined but not used [-Wunused-function]
lookup_arphrd (register const char *str, register size_t len)
^~~~~~~~~~~~~
|
|
usec_t is always 64bit, which means it can cover quite a number of
years. However, 4 digit year display and glibc limitations around time_t
limit what we can actually parse and format. Let's make this explicit,
so that we never end up formatting dates we can#t parse and vice versa.
Note that this is really just about formatting/parsing. Internal
calculations with times outside of the formattable range are not
affected.
|
|
We use different idioms at different places. Let's replace this is the
one true new idiom, that is even a bit faster...
|
|
networkd: Allow ':' in label
This reverts a341dfe563 and takes a slightly different approach: anything is
allowed in network interface labels, but network interface names are verified
as before (i.e. amongst other things, no colons are allowed there).
|
|
If chase_symlinks() encouters an absolute symlink, it resets the todo
buffer to just the newly discovered symlink and discards any of the
remaining previous symlink path. Regardless of whether or not the
symlink is absolute or relative, we need to preserve the remainder of
the path that has not yet been resolved.
|
|
':' in not a a valid interface name.
|
|
interfaces (#5117)
|
|
|
|
This substantially reworks the seccomp code, to ensure better
compatibility with some architectures, including i386.
So far we relied on libseccomp's internal handling of the multiple
syscall ABIs supported on Linux. This is problematic however, as it does
not define clear semantics if an ABI is not able to support specific
seccomp rules we install.
This rework hence changes a couple of things:
- We no longer use seccomp_rule_add(), but only
seccomp_rule_add_exact(), and fail the installation of a filter if the
architecture doesn't support it.
- We no longer rely on adding multiple syscall architectures to a single filter,
but instead install a separate filter for each syscall architecture
supported. This way, we can install a strict filter for x86-64, while
permitting a less strict filter for i386.
- All high-level filter additions are now moved from execute.c to
seccomp-util.c, so that we can test them independently of the service
execution logic.
- Tests have been added for all types of our seccomp filters.
- SystemCallFilters= and SystemCallArchitectures= are now implemented in
independent filters and installation logic, as they semantically are
very much independent of each other.
Fixes: #4575
|
|
|
|
Add AF_VSOCK socket activation support
|
|
The AF_VSOCK address family facilitates guest<->host communication on
VMware and KVM (virtio-vsock). Adding support to systemd allows guest
agents to be launched through .socket unit files. Today guest agents
are stand-alone daemons running inside guests that do not take advantage
of systemd socket activation.
|
|
gperf-3.1 generates lookup functions that take a size_t length
parameter instead of unsigned int. Test for this at configure time.
Fixes: https://github.com/systemd/systemd/issues/5039
|
|
"%d (%m) %s\n" crashes asan: https://github.com/google/sanitizers/issues/759
So, let's place %m after %s
Fixes:
```
$ ./libtool --mode=execute ./test-selinux
...
============ test_misc ==========
ASAN:DEADLYSIGNAL
=================================================================
==2981==ERROR: AddressSanitizer: SEGV on unknown address 0x000041b58ab3 (pc 0x7fd9c55a0eb2 bp 0x7fffdc2f9640 sp 0x7fffdc2f8d68 T0)
#0 0x7fd9c55a0eb1 (/lib64/libasan.so.3+0xdeeb1)
#1 0x7fd9c5550bbf (/lib64/libasan.so.3+0x8ebbf)
#2 0x7fd9c5552cdd in __interceptor_vsnprintf (/lib64/libasan.so.3+0x90cdd)
#3 0x7fd9c5063715 in log_internalv src/basic/log.c:680
#4 0x7fd9c506390a in log_internal src/basic/log.c:697
#5 0x561d398181a2 in test_misc src/test/test-selinux.c:81
#6 0x561d398185e8 in main src/test/test-selinux.c:117
#7 0x7fd9c493a400 in __libc_start_main (/lib64/libc.so.6+0x20400)
#8 0x561d39817859 in _start (/home/vagrant/systemd-asan/.libs/lt-test-selinux+0x1859)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/lib64/libasan.so.3+0xdeeb1)
==2981==ABORTING
```
|
|
This makes sure we can invoke it safely from out "mkosi.build" script
when mkosi is invoked for a read-only image.
|
|
Also, add tests to make sure this actually works as intended.
|
|
If a hex string has an uneven length, generate an error instead of
silently assuming a trailing '0' was in place.
|
|
In preparation for reusing the image dissector in the GPT auto-discovery
logic, only optionally fail the dissection when we can't identify a root
partition.
In the GPT auto-discovery we are completely fine with any kind of root,
given that we run when it is already mounted and all we do is find some
additional auxiliary partitions on the same disk.
|
|
This improves kernel command line parsing in a number of ways:
a) An kernel option "foo_bar=xyz" is now considered equivalent to
"foo-bar-xyz", i.e. when comparing kernel command line option names "-" and
"_" are now considered equivalent (this only applies to the option names
though, not the option values!). Most of our kernel options used "-" as word
separator in kernel command line options so far, but some used "_". With
this change, which was a source of confusion for users (well, at least of
one user: myself, I just couldn't remember that it's systemd.debug-shell,
not systemd.debug_shell). Considering both as equivalent is inspired how
modern kernel module loading normalizes all kernel module names to use
underscores now too.
b) All options previously using a dash for separating words in kernel command
line options now use an underscore instead, in all documentation and in
code. Since a) has been implemented this should not create any compatibility
problems, but normalizes our documentation and our code.
c) All kernel command line options which take booleans (or are boolean-like)
have been reworked so that "foobar" (without argument) is now equivalent to
"foobar=1" (but not "foobar=0"), thus normalizing the handling of our
boolean arguments. Specifically this means systemd.debug-shell and
systemd_debug_shell=1 are now entirely equivalent.
d) All kernel command line options which take an argument, and where no
argument is specified will now result in a log message. e.g. passing just
"systemd.unit" will no result in a complain that it needs an argument. This
is implemented in the proc_cmdline_missing_value() function.
e) There's now a call proc_cmdline_get_bool() similar to proc_cmdline_get_key()
that parses booleans (following the logic explained in c).
f) The proc_cmdline_parse() call's boolean argument has been replaced by a new
flags argument that takes a common set of bits with proc_cmdline_get_key().
g) All kernel command line APIs now begin with the same "proc_cmdline_" prefix.
h) There are now tests for much of this. Yay!
|
|
Check if the parsed seconds value fits in an integer *after*
multiplying by USEC_PER_SEC, otherwise a large value can trigger
modulo by zero during normalization.
|
|
This is useful for reusing the dissector logic in the gpt-auto-discovery logic:
there we really don't want to use MBR or naked file systems as root device.
|
|
Let's use chase_symlinks() when looking for /etc/os-release and
/usr/lib/os-release as these files might be symlinks (and actually are IRL on
some distros).
|
|
calendarspec: allow repetition values with ranges
|
|
|
|
This means that callers can distiguish an error from flags==0,
and don't have to special-case the empty string.
|
|
"Every other hour from 9 until 5" can be written as
`9..17/2:00` instead of `9,11,13,15,17:00`
|
|
PR_SET_MM_ARG_START allows us to relatively cleanly implement process renaming.
However, it's only available with privileges. Hence, let's try to make use of
it, and if we can't fall back to the traditional way of overriding argv[0].
This removes size restrictions on the process name shown in argv[] at least for
privileged processes.
|
|
Fixes:
```
$ ./libtool --mode=execute valgrind --leak-check=full ./test-fs-util
...
==22871==
==22871== 27 bytes in 1 blocks are definitely lost in loss record 1 of 1
==22871== at 0x4C2FC47: realloc (vg_replace_malloc.c:785)
==22871== by 0x4E86D05: strextend (string-util.c:726)
==22871== by 0x4E8F347: chase_symlinks (fs-util.c:712)
==22871== by 0x109EBF: test_chase_symlinks (test-fs-util.c:75)
==22871== by 0x10C381: main (test-fs-util.c:305)
==22871==
```
Closes #4888
|
|
set up a per-service session kernel keyring, and store the invocation ID in it
|
|
This adds two new settings BindPaths= and BindReadOnlyPaths=. They allow
defining arbitrary bind mounts specific to particular services. This is
particularly useful for services with RootDirectory= set as this permits making
specific bits of the host directory available to chrooted services.
The two new settings follow the concepts nspawn already possess in --bind= and
--bind-ro=, as well as the .nspawn settings Bind= and BindReadOnly= (and these
latter options should probably be renamed to BindPaths= and BindReadOnlyPaths=
too).
Fixes: #3439
|
|
This makes "systemd-run -p MountFlags=shared -t /bin/sh" work, by making
MountFlags= to the list of properties that may be accessed transiently.
|
|
Let's store the invocation ID in the per-service keyring as a root-owned key,
with strict access rights. This has the advantage over the environment-based ID
passing that it also works from SUID binaries (as they key cannot be overidden
by unprivileged code starting them), in contrast to the secure_getenv() based
mode.
The invocation ID is now passed in three different ways to a service:
- As environment variable $INVOCATION_ID. This is easy to use, but may be
overriden by unprivileged code (which might be a bad or a good thing), which
means it's incompatible with SUID code (see above).
- As extended attribute on the service cgroup. This cannot be overriden by
unprivileged code, and may be queried safely from "outside" of a service.
However, it is incompatible with containers right now, as unprivileged
containers generally cannot set xattrs on cgroupfs.
- As "invocation_id" key in the kernel keyring. This has the benefit that the
key cannot be changed by unprivileged service code, and thus is safe to
access from SUID code (see above). But do note that service code can replace
the session keyring with a fresh one that lacks the key. However in that case
the key will not be owned by root, which is easily detectable. The keyring is
also incompatible with containers right now, as it is not properly namespace
aware (but this is being worked on), and thus most container managers mask
the keyring-related system calls.
Ideally we'd only have one way to pass the invocation ID, but the different
ways all have limitations. The invocation ID hookup in journald is currently
only available on the host but not in containers, due to the mentioned
limitations.
How to verify the new invocation ID in the keyring:
# systemd-run -t /bin/sh
Running as unit: run-rd917366c04f847b480d486017f7239d6.service
Press ^] three times within 1s to disconnect TTY.
# keyctl show
Session Keyring
680208392 --alswrv 0 0 keyring: _ses
250926536 ----s-rv 0 0 \_ user: invocation_id
# keyctl request user invocation_id
250926536
# keyctl read 250926536
16 bytes of data in key:
9c96317c ac64495a a42b9cd7 4f3ff96b
# echo $INVOCATION_ID
9c96317cac64495aa42b9cd74f3ff96b
# ^D
This creates a new transient service runnint a shell. Then verifies the
contents of the keyring, requests the invocation ID key, and reads its payload.
For comparison the invocation ID as passed via the environment variable is also
displayed.
|
|
Various specifier resolution fixes.
|
|
Generalize image dissection logic of nspawn, and make it useful for other tools.
|
|
Add new "khash" API and add new sd_id128_get_machine_app_specific() function
|
|
|
|
This adds support for discovering and making use of properly tagged dm-verity
data integrity partitions. This extends both systemd-nspawn and systemd-dissect
with a new --root-hash= switch that takes the root hash to use for the root
partition, and is otherwise fully automatic.
Verity partitions are discovered automatically by GPT table type UUIDs, as
listed in
https://www.freedesktop.org/wiki/Specifications/DiscoverablePartitionsSpec/
(which I updated prior to this change, to include new UUIDs for this purpose.
mkosi with https://github.com/systemd/mkosi/pull/39 applied may generate images
that carry the necessary integrity data. With that PR and this commit, the
following simply lines suffice to boot up an integrity-protected container image:
```
# mkdir test
# cd test
# mkosi --verity
# systemd-nspawn -i ./image.raw -bn
```
Note that mkosi writes the image file to "image.raw" next to a a file
"image.roothash" that contains the root hash. systemd-nspawn will look for that
file and use it if it exists, in case --root-hash= is not specified explicitly.
|
|
This adds two new APIs to systemd:
- loop-util.h is a simple internal API for allocating, setting up and releasing
loopback block devices.
- dissect-image.h is an internal API for taking apart disk images and figuring
out what the purpose of each partition is.
Both APIs are basically refactored versions of similar code in nspawn. This
rework should permit us to reuse this in other places than just nspawn in the
future. Specifically: to implement RootImage= in the service image, similar to
RootDirectory=, but operating on a disk image; to unify the gpt-auto-discovery
generator code with the discovery logic in nspawn; to add new API to machined
for determining the OS version of a disk image (i.e. not just running
containers). This PR does not make any such changes however, it just provides
the new reworked API.
The reworked code is also slightly more powerful than the nspawn original one.
When pointing it to an image or block device with a naked file system (i.e. no
partition table) it will simply make it the root device.
|
|
"*:*" should be equivalent to "*-*-* *:*:00" (minutely)
rather than running every microsecond.
Fixes #4804
|
|
Let's accept "µs" as alternative time unit for microseconds. We already accept
"us" and "usec" for them, lets extend on this and accept the proper scientific
unit specification too.
We will never output this as time unit, but it's fine to accept it, after all
we are pretty permissive with time units already.
|
|
As suggested by @keszybz
|
|
This new flag controls whether to consider a problem if the referenced path
doesn't actually exist. If specified it's OK if the final file doesn't exist.
Note that this permits one or more final components of the path not to exist,
but these must not contain "../" for safety reasons (or, to be extra safe,
neither "./" and a couple of others, i.e. what path_is_safe() permits).
This new flag is useful when resolving paths before issuing an mkdir() or
open(O_CREAT) on a path, as it permits that the file or directory is created
later.
The return code of chase_symlinks() is changed to return 1 if the file exists,
and 0 if it doesn't. The latter is only returned in case CHASE_NON_EXISTING is
set.
|