Age | Commit message (Collapse) | Author |
|
|
|
Systemd controller on unified v2
|
|
If our netlink input buffer overruns the kernel will send us ENOBUFS on
the next recvmsg(). Don't consider this a complete failure resulting in
closing of the netlink socket. Instead, simply continue (after debug
logging).
Of course, ideally we'd have a better strategy for this, and would have
a way to resync if this happens (as well as a scheme for cancelling all
ongoing asynchronous transactions), but for now let's at least not choke
fatally, and simply accept that we lost some messages and continue.
Note that if we lose messages when synchronously waiting for an
operation to complete, we'll still propagate the ENOBUFS up, to make the
individual transaction fail.
See: #5398
(This bug does not properly fix the issue, hence we should leave the bug
open.)
|
|
Coverity was complaining about TOCTOU (CID #745806). Indeed, it seems better
to open the file and avoid the stat altogether:
- O_NOFOLLOW means we'll get ELOOP, which we can translate to EINVAL as before,
- similarly, open(O_WRONLY) on a directory will fail with EISDIR,
- and finally, it makes no sense to check access mode ourselves: just let
the kernel do it and propagate the error.
v2:
- fix memleak, don't clober input arg
|
|
cg_[all_]unified() test whether a specific controller or all controllers are on
the unified hierarchy. While what's being asked is a simple binary question,
the callers must assume that the functions may fail any time, which
unnecessarily complicates their usages. This complication is unnecessary.
Internally, the test result is cached anyway and there are only a few places
where the test actually needs to be performed.
This patch simplifies cg_[all_]unified().
* cg_[all_]unified() are updated to return bool. If the result can't be
decided, assertion failure is triggered. Error handlings from their callers
are dropped.
* cg_unified_flush() is updated to calculate the new result synchrnously and
return whether it succeeded or not. Places which need to flush the test
result are updated to test for failure. This ensures that all the following
cg_[all_]unified() tests succeed.
* Places which expected possible cg_[all_]unified() failures are updated to
call and test cg_unified_flush() before calling cg_[all_]unified(). This
includes functions used while setting up mounts during boot and
manager_setup_cgroup().
|
|
(#5271)
The code make the following assertion: when freeing a event loop object
(usually it's done after exiting from the main event loop), no signal events
are still queued and are pending.
This assertion can be found in event_unmask_signal_data() with
"assert(!d->current);" assertion.
It appears that this assertion can be wrong at least in a specific case
described below.
Consider the following example which is inspired from udev: a process defines 3
source events: 2 are created by sd_event_add_signal() and 1 is created by
sd_event_add_post().
1. the process receives the 2 signals consecutively so that signal 'A' source
event is queued and pending. Consequently the post source event is also
queued and pending. This is done by sd_event_wait().
2. The callback for signal 'A' is called by sd_event_dispatch().
3. The next call to sd_event_wait() will queue signal 'B' source event.
4. The callback for the post source event is called and calls sd_event_exit().
5. the event loop is exited.
6. freeing the event loop object will lead to the assertion failure in
event_unmask_signal_data().
This patch simply removes this assertion as it doesn't seem to be a
bug if the signal data still reference a signal source at this point.
|
|
Let's add an extra safety check: before entering a reload/reexec, let's
verify that there's enough room in /run for it.
Fixes: #5016
|
|
If a callback of an event source returns an error, then the event source
might already be half-destroyed, if the callback dropped all refs.
Hence, don't assume that the type is still valid, and save it before we
issue the callback.
|
|
76ec966f0e33685f833 changed the code from ESHUTDOWN to ERFKILL, but missed one
spot in bus-common-errors.c. Fix that.
The code in transaction.c was checking for ERFKILL, but I'm not sure if this
mismatch had any effect, i.e. if there were any code paths in which the wrong
code actually made difference.
Also add comments when ESHUTDOWN is used in the journal code, so it's easy to
distinguish those cases when grepping. Standarize on the same capitalization.
(There's also a bunch of uses in sd-bus.c, but that's clearly different.)
|
|
gcc 7 adds -Wimplicit-fallthrough=3 to -Wextra. There are a few ways
we could deal with that. After we take into account the need to stay compatible
with older versions of the compiler (and other compilers), I don't think adding
__attribute__((fallthrough)), even as a macro, is worth the trouble. It sticks
out too much, a comment is just as good. But gcc has some very specific
requiremnts how the comment should look. Adjust it the specific form that it
likes. I don't think the extra stuff we had in those comments was adding much
value.
(Note: the documentation seems to be wrong, and seems to describe a different
pattern from the one that is actually used. I guess either the docs or the code
will have to change before gcc 7 is finalized.)
|
|
Fixes #1188.
|
|
Let's store the invocation ID in the per-service keyring as a root-owned key,
with strict access rights. This has the advantage over the environment-based ID
passing that it also works from SUID binaries (as they key cannot be overidden
by unprivileged code starting them), in contrast to the secure_getenv() based
mode.
The invocation ID is now passed in three different ways to a service:
- As environment variable $INVOCATION_ID. This is easy to use, but may be
overriden by unprivileged code (which might be a bad or a good thing), which
means it's incompatible with SUID code (see above).
- As extended attribute on the service cgroup. This cannot be overriden by
unprivileged code, and may be queried safely from "outside" of a service.
However, it is incompatible with containers right now, as unprivileged
containers generally cannot set xattrs on cgroupfs.
- As "invocation_id" key in the kernel keyring. This has the benefit that the
key cannot be changed by unprivileged service code, and thus is safe to
access from SUID code (see above). But do note that service code can replace
the session keyring with a fresh one that lacks the key. However in that case
the key will not be owned by root, which is easily detectable. The keyring is
also incompatible with containers right now, as it is not properly namespace
aware (but this is being worked on), and thus most container managers mask
the keyring-related system calls.
Ideally we'd only have one way to pass the invocation ID, but the different
ways all have limitations. The invocation ID hookup in journald is currently
only available on the host but not in containers, due to the mentioned
limitations.
How to verify the new invocation ID in the keyring:
# systemd-run -t /bin/sh
Running as unit: run-rd917366c04f847b480d486017f7239d6.service
Press ^] three times within 1s to disconnect TTY.
# keyctl show
Session Keyring
680208392 --alswrv 0 0 keyring: _ses
250926536 ----s-rv 0 0 \_ user: invocation_id
# keyctl request user invocation_id
250926536
# keyctl read 250926536
16 bytes of data in key:
9c96317c ac64495a a42b9cd7 4f3ff96b
# echo $INVOCATION_ID
9c96317cac64495aa42b9cd74f3ff96b
# ^D
This creates a new transient service runnint a shell. Then verifies the
contents of the keyring, requests the invocation ID key, and reads its payload.
For comparison the invocation ID as passed via the environment variable is also
displayed.
|
|
|
|
Udev property ordering
|
|
|
|
Add new "khash" API and add new sd_id128_get_machine_app_specific() function
|
|
Let's use chase_symlinks() everywhere, and stop using GNU
canonicalize_file_name() everywhere. For most cases this should not change
behaviour, however increase exposure of our function to get better tested. Most
importantly in a few cases (most notably nspawn) it can take the correct root
directory into account when chasing symlinks.
|
|
We have only two callers, and for neither this "optimization" is useful.
So let's drop it an save some code and a malloc.
|
|
We cannot compare filenames directly, because paths are not sortable
lexicographically, e.g. /etc/udev is "later" (has higher priority)
than /usr/lib/udev.
The on-disk format is changed to have a separate field for "file priority",
which is stored when writing the binary file, and then loaded and used in
comparisons. For data in the previous format (as generated by systemd 232),
this information is not available, and we use a trick where the offset into the
string table is used as a proxy for priority. Most of the time strings are
stored in the order in which the files were processed. This is not entirely
reliable, but is good enough to properly order /usr/lib and /etc/, which are
the two most common cases. This hack is included because it allows proper
parsing of files until the binary hwdb is regenerated.
Instead of adding a new field, I reduced the size of line_number from 64 to 32
bits, and added a 16 bit priority field, and 16 bits of padding. Adding a new
field of 16 bytes would significantly screw up alignment and increase file
size, and line number realistically don't need more than ~20 bits.
Fixes #4750.
|
|
|
|
This adds an API for retrieving an app-specific machine ID to sd-id128.
Internally it calculates HMAC-SHA256 with an 128bit app-specific ID as payload
and the machine ID as key.
(An alternative would have been to use siphash for this, which is also
cryptographically strong. However, as it only generates 64bit hashes it's not
an obvious choice for generating 128bit IDs.)
Fixes: #4667
|
|
This patch handles the custom MTU field in IPv6 RA.
fixes RFE #4464
|
|
Fixes: #4721
|
|
To properly store priority in passed in pointer and return 0 for success.
Also add a test for verifying that it works correctly.
|
|
extract_first_words deals fine with the string being NULL, so drop the upfront
check for that.
|
|
busctl introspect: accept direction="out" for signals.
|
|
|
|
According to the D-Bus spec (v0.29),
| The direction element on <arg> may be omitted, in which case it
| defaults to "in" for method calls and "out" for signals. Signals only
| allow "out" so while direction may be specified, it's pointless.
Therefore we still should accept a 'direction' attribute, even if it's
useless in reality.
Closes: #4616
|
|
Format string tweaks (and a small fix on 32bit)
|
|
The .so symlinks got moved to rootlibdir in 082210c7.
|
|
According to comments in <asm/types.h>, __u64 is always defined as unsigned
long long. Those casts should be superfluous.
|
|
We don't have plural in the name of any other -util files and this
inconsistency trips me up every time I try to type this file name
from memory. "formats-util" is even hard to pronounce.
|
|
This makes strjoin and strjoina more similar and avoids the useless final
argument.
spatch -I . -I ./src -I ./src/basic -I ./src/basic -I ./src/shared -I ./src/shared -I ./src/network -I ./src/locale -I ./src/login -I ./src/journal -I ./src/journal -I ./src/timedate -I ./src/timesync -I ./src/nspawn -I ./src/resolve -I ./src/resolve -I ./src/systemd -I ./src/core -I ./src/core -I ./src/libudev -I ./src/udev -I ./src/udev/net -I ./src/udev -I ./src/libsystemd/sd-bus -I ./src/libsystemd/sd-event -I ./src/libsystemd/sd-login -I ./src/libsystemd/sd-netlink -I ./src/libsystemd/sd-network -I ./src/libsystemd/sd-hwdb -I ./src/libsystemd/sd-device -I ./src/libsystemd/sd-id128 -I ./src/libsystemd-network --sp-file coccinelle/strjoin.cocci --in-place $(git ls-files src/*.c)
git grep -e '\bstrjoin\b.*NULL' -l|xargs sed -i -r 's/strjoin\((.*), NULL\)/strjoin(\1)/'
This might have missed a few cases (spatch has a really hard time dealing
with _cleanup_ macros), but that's no big issue, they can always be fixed
later.
|
|
|
|
|
|
This adds a new invocation ID concept to the service manager. The invocation ID
identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is
generated each time a unit moves from and inactive to an activating or active
state.
The primary usecase for this concept is to connect the runtime data PID 1
maintains about a service with the offline data the journal stores about it.
Previously we'd use the unit name plus start/stop times, which however is
highly racy since the journal will generally process log data after the service
already ended.
The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel,
except that it applies to an individual unit instead of the whole system.
The invocation ID is passed to the activated processes as environment variable.
It is additionally stored as extended attribute on the cgroup of the unit. The
latter is used by journald to automatically retrieve it for each log logged
message and attach it to the log entry. The environment variable is very easily
accessible, even for unprivileged services. OTOH the extended attribute is only
accessible to privileged processes (this is because cgroupfs only supports the
"trusted." xattr namespace, not "user."). The environment variable may be
altered by services, the extended attribute may not be, hence is the better
choice for the journal.
Note that reading the invocation ID off the extended attribute from journald is
racy, similar to the way reading the unit name for a logging process is.
This patch adds APIs to read the invocation ID to sd-id128:
sd_id128_get_invocation() may be used in a similar fashion to
sd_id128_get_boot().
PID1's own logging is updated to always include the invocation ID when it logs
information about a unit.
A new bus call GetUnitByInvocationID() is added that allows retrieving a bus
path to a unit by its invocation ID. The bus path is built using the invocation
ID, thus providing a path for referring to a unit that is valid only for the
current runtime cycleof it.
Outlook for the future: should the kernel eventually allow passing of cgroup
information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we
can alter the invocation ID to be generated as hash from that rather than
entirely randomly. This way we can derive the invocation race-freely from the
messages.
|
|
We generate these, hence we should also add errno translations for them.
|
|
These were forgotten, let's add some useful mappings for all errors we define.
|
|
As suggested here:
https://github.com/systemd/systemd/pull/4296#issuecomment-251911349
Let's try AF_INET first as socket, but let's fall back to AF_NETLINK, so that
we can use a protocol-independent socket here if possible. This has the benefit
that our code will still work even if AF_INET/AF_INET6 is made unavailable (for
exmple via seccomp), at least on current kernels.
|
|
hwdb: return conflicts in a well-defined order
|
|
This test sometimes fails in semaphore, but not when run interactively,
so it's hard to debug.
|
|
|
|
If we find duplicates in a property-lookup, make sure to order them by
their origin. That is, matches defined "later" take precedence over
earlier matches. The "later"-order is defined by file-name + line-number
combination. That is, if a match is defined below another one in the
same hwdb file, it takes precedence, same as if it is defined in a file
ordered after another one.
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
|
|
Extend the hwdb to store the source file-name and file-number for each
property. We simply extend the stored value struct with the new
information. It is fully backwards compatible and old readers will
continue to work.
The libudev/sd-hwdb reader is updated in a followup.
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
|
|
It is not legal to use hard-coded types to calculate offsets. We must
always use the offsets of the hwdb header to calculate those. Otherwise,
we will break horribly if run on hwdb files written by other
implementations or written with future extensions.
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
|
|
1. add support for kind vcan
2. fixup indention netlink-types.c, networkd-netdev.c
|
|
|
|
Let's bump it further, as this the current limit turns out to be problematic
IRL. Let's bump it to more than twice what we know of is needed.
Fixes: #4068
|
|
Old libdbus has a feature that the process is terminated whenever the the bus
connection receives a disconnect. This is pretty useful on desktop apps (where
a disconnect indicates session termination), as well as on command line apps
(where we really shouldn't stay hanging in most cases if dbus daemon goes
down).
Add a similar feature to sd-bus, but make it opt-in rather than opt-out, like
it is on libdbus. Also, if the bus is attached to an event loop just exit the
event loop rather than the the whole process.
|
|
This tests in particular that disconnecting results in the tracking object's
handlers to be called.
|