Age | Commit message (Collapse) | Author |
|
|
|
The log forward levels can be configured through kernel command line.
|
|
Now that determine_space_for() only deals with storage space (cached) values,
rename it so it reflects the fact that only the cached storage space values are
updated.
|
|
Updating min_use is rather an unusual operation that is limited when we first
open the journal files, therefore extracts it from determine_space_for() and
create a function of its own and call this new function when needed.
determine_space_for() is now dealing with storage space (cached) values only.
There should be no functional changes.
|
|
Introduce a dedicated helper in order to reset the storage space cache.
|
|
The set of storage space values we cache are calculated according to a couple
of filesystem statistics (free blocks, block size).
This patch caches the vfs stats we're interested in so these values are
available later and coherent with the rest of the space cached values.
|
|
This patch makes system_journal_open() stop emitting the space usage
message. The caller is now free to emit this message when appropriate.
When restarting the journal, we can now emit the message *after*
flushing the journal (if required) so that all flushed log entries are
written in the persistent journal *before* the status message.
This is required since the status message is always younger than the
flushed entries.
Fixes #4190.
|
|
This commit simply extracts from determine_space_for() the code which emits the
storage usage message and put it into a function of its own so it can be reused
by others paths later.
No functional changes.
|
|
This structure keeps track of specificities for a given journal type
(persistent or volatile) such as metrics, name, etc...
The cached space values are now moved in this structure so that each
journal has its own set of cached values.
Previously only one set existed and we didn't know if the cached
values were for the runtime journal or the persistent one.
When doing:
determine_space_for(s, runtime_metrics, ...);
determine_space_for(s, system_metrics, ...);
the second call returned the cached values for the runtime metrics.
|
|
This commit simply extracts from determine_space_for() the code which
determines the FS usage where the passed path lives (statvfs(3)) and put it
into a function of its own so it can be reused by others paths later.
No functional changes.
|
|
|
|
Never permit that we write to journal files that have newer timestamps than our
local wallclock has. If we'd accept that, then the entries in the file might
end up not being ordered strictly.
Let's refuse this with ETXTBSY, and then immediately rotate to use a new file,
so that each file remains strictly ordered also be wallclock internally.
|
|
As soon as we notice that the clock jumps backwards, rotate journal files. This
is beneficial, as this makes sure that the entries in journal files remain
strictly ordered internally, and thus the bisection algorithm applied on it is
not confused.
This should help avoiding borked wallclock-based bisection on journal files as
witnessed in #4278.
|
|
Let's use the earliest linearized event timestamp for journal entries we have:
the event dispatch timestamp from the event loop, instead of requerying the
timestamp at the time of writing.
This makes the time a bit more accurate, allows us to query the kernel time one
time less per event loop, and also makes sure we always use the same timestamp
for both attempts to write an entry to a journal file.
|
|
This adds a new invocation ID concept to the service manager. The invocation ID
identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is
generated each time a unit moves from and inactive to an activating or active
state.
The primary usecase for this concept is to connect the runtime data PID 1
maintains about a service with the offline data the journal stores about it.
Previously we'd use the unit name plus start/stop times, which however is
highly racy since the journal will generally process log data after the service
already ended.
The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel,
except that it applies to an individual unit instead of the whole system.
The invocation ID is passed to the activated processes as environment variable.
It is additionally stored as extended attribute on the cgroup of the unit. The
latter is used by journald to automatically retrieve it for each log logged
message and attach it to the log entry. The environment variable is very easily
accessible, even for unprivileged services. OTOH the extended attribute is only
accessible to privileged processes (this is because cgroupfs only supports the
"trusted." xattr namespace, not "user."). The environment variable may be
altered by services, the extended attribute may not be, hence is the better
choice for the journal.
Note that reading the invocation ID off the extended attribute from journald is
racy, similar to the way reading the unit name for a logging process is.
This patch adds APIs to read the invocation ID to sd-id128:
sd_id128_get_invocation() may be used in a similar fashion to
sd_id128_get_boot().
PID1's own logging is updated to always include the invocation ID when it logs
information about a unit.
A new bus call GetUnitByInvocationID() is added that allows retrieving a bus
path to a unit by its invocation ID. The bus path is built using the invocation
ID, thus providing a path for referring to a unit that is valid only for the
current runtime cycleof it.
Outlook for the future: should the kernel eventually allow passing of cgroup
information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we
can alter the invocation ID to be generated as hash from that rather than
entirely randomly. This way we can derive the invocation race-freely from the
messages.
|
|
|
|
We are already attaching the system slice information to log messages, now add
theuser slice info too, as well as the object slice info.
|
|
|
|
In preparation for adding a version which takes a strv.
|
|
Minor cleanup suggested by Lennart.
|
|
When the system journal becomes re-opened post-flush with the runtime
journal open, it implies we've recovered from something like an ENOSPC
situation where the system journal rotate had failed, leaving the system
journal closed, causing the runtime journal to be opened post-flush.
For the duration of the unavailable system journal, we log to the
runtime journal. But when the system journal gets opened (space made
available, for example), we need to close the runtime journal before new
journal writes will go to the system journal. Calling
server_flush_to_var() after opening the system journal with a runtime
journal present, post-flush, achieves this while preserving the runtime
journal's contents in the system journal.
The combination of the present flushed flag file and the runtime journal
being open is a state where we should be logging to the system journal,
so it's appropriate to resume doing so once we've successfully opened
the system journal.
|
|
Dynamic users should be treated like system users, and their logs
should end up in the main system journal.
|
|
If journals get into a closed state like when rotate fails due to
ENOSPC, when space is made available it currently goes unnoticed leaving
the journals in a closed state indefinitely.
By calling system_journal_open() on entry to find_journal() we ensure
the journal has been opened/created if possible.
Also moved system_journal_open() up to after open_journal(), before
find_journal().
Fixes https://github.com/systemd/systemd/issues/3968
|
|
It's a bit easier to read because shorter. Also, most likely a tiny bit faster.
|
|
https://github.com/SELinuxProject/selinux/commit/9eb9c9327563014ad6a807814e7975424642d5b9
deprecated selinux_context_t. Replace with a simple char* everywhere.
Alternative fix for #3719.
|
|
|
|
The macro determines the right length of a AF_UNIX "struct sockaddr_un" to pass to
connect() or bind(). It automatically figures out if the socket refers to an
abstract namespace socket, or a socket in the file system, and properly handles
the full length of the path field.
This macro is not only safer, but also simpler to use, than the usual
offsetof() + strlen() logic.
|
|
Also, expose this via the "journalctl --file=-" syntax for STDIN. This feature
remains undocumented though, as it is probably not too useful in real-life as
this still requires fds that support mmaping and seeking, i.e. does not work
for pipes, for which reading from STDIN is most commonly used.
|
|
|
|
The comments and the log messages are next to one another, so it's easier
to check that the messages match the comments.
The sign was omitted in the check for -ESHUTDOWN, so it was never matched.
|
|
|
|
When we rotate journals, we must set offline and close the current one,
but don't generally need to wait for this to complete.
Instead, we'll initiate an asynchronous offline via
journal_file_set_offline(oldfile, false), and add the file to a
per-server set of deferred closes to be closed later when they
won't block.
There's one complication however; journal_file_open() via
journal_file_verify_header() assumes that any writable journal in the
online state is the product of an unclean shutdown or other form of
corruption.
Thus there's a need for journal_file_open() to be aware of deferred
closes and synchronize with their completion when opening preexisting
journals for writing. To facilitate this the deferred closes set is
supplied to the journal_file_open() function where the deferred closes
may be closed synchronously before verifying the header in such
circumstances.
|
|
This adds a wait flag to journal_file_set_offline(), when false the offline is
performed asynchronously in a separate thread.
When wait is true, if an asynchronous offline is already in-progress it is
restarted and waited for. Otherwise the offline is performed synchronously
without the use of a thread.
journal_file_set_online() cancels or waits for the asynchronous offline to
complete if in-flight, depending on where in the offline process the thread
happens to be. If the thread is in the fsync() phase, it is cancelled and
waiting is unnecessary. Otherwise, the thread is joined before proceeding.
A new offline_state member is added to JournalFile which is used via
atomic operations for communicating between the offline thread and the
journal_file_set_{offline,online}() functions.
|
|
|
|
This should be handled fine now by .dir-locals.el, so need to carry that
stuff in every file.
|
|
None of the callers take advantage of this parameter, it's always NULL,
this is just a private helper function to simplify the call sites so
drop the template parameter altogether. If a caller emerges later who
needs it, it can be restored.
|
|
journald: minor fixes
|
|
Whenever we include a log level or facility in a journal string field, make sure the compiler checks for us that that's
actually the right thing to do.
|
|
Journald disk usage
|
|
This primarily contains some minor coding style fixups for 7a24f3bf2fb181243a1957a0cdd54cd919396793 and earlier changes. Specifically:
* Don't log at log levels above LOG_DEBUG from "library" code like journal-file.c
* Don't negate errno values before passing them to log_debug_errno(), as the call can handle this fine anyway
* Cast some calls we knowingly ignore the return values of to (void)
* Don't clobber function call-by-ref return values on failure
* Don't mix function calls and variable declarations in one line
There's also one more relevant change: when failing to enqueue a journal change fs event, we'll run it immediately.
|
|
v2:
- use xsprintf
|
|
journal: coalesce ftruncate()s in 250ms windows
|
|
The format of the journald disk usage log entry was changed back and
forth a few times. It is annoying to have a very verbose message, but
if it is short it is hard to understand. But we have a tool for this,
the catalogue.
$ journalctl -x -u systemd-journald
Jan 23 18:48:50 rawhide systemd-journald[891]: Runtime journal (/run/log/journal/) is 8.0M, max 196.2M, 188.2M free.
-- Subject: Disk space used by the journal
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Runtime journal (/run/log/journal/) is currently using 8.0M.
-- Maximum allowed usage is set to 196.2M.
-- Leaving at least 294.3M free (of currently available 1.9G of disk space).
-- Enforced usage limit is thus 196.2M, of which 188.2M are still available.
--
-- The limits controlling how much disk space is used by the journal may
-- be configured with SystemMaxUse=, SystemKeepFree=, SystemMaxFileSize=,
-- RuntimeMaxUse=, RuntimeKeepFree=, RuntimeMaxFileSize= settings in
-- /etc/systemd/journald.conf. See journald.conf(5) for details.
Jan 23 18:48:50 rawhide systemd-journald[891]: System journal (/var/log/journal/) is 480.1M, max 1.6G, 1.2G free.
-- Subject: Disk space used by the journal
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- System journal (/var/log/journal/) is currently using 480.1M.
-- Maximum allowed usage is set to 1.6G.
-- Leaving at least 2.5G free (of currently available 5.8G of disk space).
-- Enforced usage limit is thus 1.6G, of which 1.2G are still available.
--
-- The limits controlling how much disk space is used by the journal may
-- be configured with SystemMaxUse=, SystemKeepFree=, SystemMaxFileSize=,
-- RuntimeMaxUse=, RuntimeKeepFree=, RuntimeMaxFileSize= settings in
-- /etc/systemd/journald.conf. See journald.conf(5) for details.
|
|
The code to format the iovec is shared with log.c. All call sites to
server_driver_message are changed to include the additional "MESSAGE="
part, but the new functionality is not used and change in functionality
is not expected.
iovec is preallocated, so the maximum number of messages is limited.
In server_driver_message N_IOVEC_PAYLOAD_FIELDS is currently set to 1.
New code is not oom safe, it will fail if memory cannot be allocated.
This will be fixed in subsequent commit.
|
|
|
|
Prior to this change every journal append causes an ftruncate() for the
sake of inotify propagation of the mmap-based writes.
With this change the notification is deferred up to ~250ms, coalescing
any repeated journal writes during the deferred period into a single
ftruncate(). The ftruncate() call isn't free and doing it on every
append adds unnecessary overhead and latency in the journald event loop.
Introduces journal_file_enable_post_change_timer() which manages a
timer on the provided sd-event instance for scheduling coalesced
ftruncates. The ftruncate() behavior is unchanged unless
journal_file_enable_post_change_timer() is called on the JournalFile.
While not a tremendous improvement, profiling systemd-journald event loop
latencies using instrumentation as introduced by 34b8751 it was observed that
coalescing the ftruncates was low-hanging fruit worth pursuing.
Note orders 12 and 13 shifting left into order 11 and order 6 dipping into
order 5:
Unmodified:
log2(us) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
-----------------------------------------------------------
[10685.414572] 0 0 0 0 38 602 61 2 290 60 1643 2554 13 1 4 1 0 0 1
[10690.415114] 0 0 0 0 0 646 54 7 309 44 2073 2148 17 1 3 0 0 0 1
[10695.415509] 0 0 0 0 1 650 73 3 324 37 2071 2270 9 0 0 1 0 1 0
[10700.416297] 0 0 0 0 0 659 50 4 318 38 2111 2152 6 0 1 0 0 1 1
[10705.417136] 0 0 0 0 2 660 48 4 320 38 2129 2146 12 1 1 0 0 1 1
[10710.489114] 0 0 0 0 0 673 38 3 321 37 1925 2339 7 0 0 0 0 1 1
[10715.489613] 0 0 0 0 3 656 64 8 317 48 2365 2007 7 0 0 0 0 0 1
Coalesced:
log2(us) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
-----------------------------------------------------------
[ 6169.161360] 0 0 0 1 24 786 54 11 389 24 4192 771 6 4 0 0 1 0 1
[ 6174.161705] 0 0 0 1 18 800 35 6 380 27 3977 893 3 1 0 0 1 0 1
[ 6179.162741] 0 0 0 1 28 768 51 4 391 16 3998 831 5 3 0 0 0 0 2
[ 6184.162856] 0 0 0 0 19 770 60 2 376 26 3795 1004 9 5 1 0 1 0 1
[ 6189.163279] 0 0 0 0 28 761 49 7 372 27 3729 1056 3 2 0 0 1 0 1
[ 6194.164255] 0 0 0 0 25 785 49 7 394 19 3996 908 6 3 2 0 0 0 1
[ 6199.164658] 0 0 0 0 29 797 35 5 389 18 3995 898 3 4 1 1 1 0 1
The remaining high-order delays are a result of the synchronous fsyncs in
systemd-journald, beyond the scope of this commit.
|
|
Two unrelated fixes
|
|
Most of the function is moved to acl-util.c to make it possible to
add tests in subsequent commit.
Setting of the mode in server_fix_perms is removed:
- we either just created the file ourselves, and the permission be better right,
- or the file was already there, and we should not modify the permissions.
server_fix_perms is renamed to server_fix_acls to better reflect new
meaning, and made static because it is only used in one file.
|
|
Let's distuingish the cases where our code takes an active role in
selinux management, or just passively reports whatever selinux
properties are set.
mac_selinux_have() now checks whether selinux is around for the passive
stuff, and mac_selinux_use() for the active stuff. The latter checks the
former, plus also checks UID == 0, under the assumption that only when
we run priviliged selinux management really makes sense.
Fixes: #1941
|
|
tree-wide: group include of libudev.h with sd-*
|