Age | Commit message (Collapse) | Author |
|
Fixes: #2304
|
|
Journald disk usage
|
|
coredump: fix bug that loses core dump files when core dumps are compressed and disk space is low.
|
|
and disk space is low.
Previously the save_external_coredump function returned a file
descriptor corresponding to the dumped file. This descriptor was used
for two different purposes by calling code: a) access to the raw core
dump data; b) testing candidate files (via inode comparisons) while
vacuuming to protect the current core dump from vacuuming.
The descriptor returned always corresponded to a file containing the raw
core dump data. However if compresson was used and the core dump was
compressed then the descriptor returned did not correspond to the file
that would eventually be left on disk (ie the compressed file). Thus
the file was never protected by vacuuming. When disk space was low all
core dumps including the current one would be vacuumed and the
corresponding log message referred to a file that no longer existed.
This resulted in the following error message from coredumpctl if the
missing core dump was requested:
Cannot retrieve coredump from journal nor disk.
Failed to retrieve core: No such file or directory
save_external_coredump now returns two descriptors, one to be used for
inode comparisons to prevent overzealous vacuuming and one to be used
for raw data access. When compression is not used the returned inode
comparison descriptor will be invalid, indicating that the raw data
access descriptor should be used for inode comparisons as well.
Corresponding use of save_external_coredump and the returned
descriptors also updated.
|
|
v2:
- use xsprintf
|
|
journal: coalesce ftruncate()s in 250ms windows
|
|
The format of the journald disk usage log entry was changed back and
forth a few times. It is annoying to have a very verbose message, but
if it is short it is hard to understand. But we have a tool for this,
the catalogue.
$ journalctl -x -u systemd-journald
Jan 23 18:48:50 rawhide systemd-journald[891]: Runtime journal (/run/log/journal/) is 8.0M, max 196.2M, 188.2M free.
-- Subject: Disk space used by the journal
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Runtime journal (/run/log/journal/) is currently using 8.0M.
-- Maximum allowed usage is set to 196.2M.
-- Leaving at least 294.3M free (of currently available 1.9G of disk space).
-- Enforced usage limit is thus 196.2M, of which 188.2M are still available.
--
-- The limits controlling how much disk space is used by the journal may
-- be configured with SystemMaxUse=, SystemKeepFree=, SystemMaxFileSize=,
-- RuntimeMaxUse=, RuntimeKeepFree=, RuntimeMaxFileSize= settings in
-- /etc/systemd/journald.conf. See journald.conf(5) for details.
Jan 23 18:48:50 rawhide systemd-journald[891]: System journal (/var/log/journal/) is 480.1M, max 1.6G, 1.2G free.
-- Subject: Disk space used by the journal
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- System journal (/var/log/journal/) is currently using 480.1M.
-- Maximum allowed usage is set to 1.6G.
-- Leaving at least 2.5G free (of currently available 5.8G of disk space).
-- Enforced usage limit is thus 1.6G, of which 1.2G are still available.
--
-- The limits controlling how much disk space is used by the journal may
-- be configured with SystemMaxUse=, SystemKeepFree=, SystemMaxFileSize=,
-- RuntimeMaxUse=, RuntimeKeepFree=, RuntimeMaxFileSize= settings in
-- /etc/systemd/journald.conf. See journald.conf(5) for details.
|
|
The code to format the iovec is shared with log.c. All call sites to
server_driver_message are changed to include the additional "MESSAGE="
part, but the new functionality is not used and change in functionality
is not expected.
iovec is preallocated, so the maximum number of messages is limited.
In server_driver_message N_IOVEC_PAYLOAD_FIELDS is currently set to 1.
New code is not oom safe, it will fail if memory cannot be allocated.
This will be fixed in subsequent commit.
|
|
Remove the old version of the lz4 stream compressor
|
|
... to determine if color output should be enabled. If the variable is not set,
fall back to using on_tty(). Also, rewrite existing code to use
colors_enabled() where appropriate.
|
|
|
|
Prior to this change every journal append causes an ftruncate() for the
sake of inotify propagation of the mmap-based writes.
With this change the notification is deferred up to ~250ms, coalescing
any repeated journal writes during the deferred period into a single
ftruncate(). The ftruncate() call isn't free and doing it on every
append adds unnecessary overhead and latency in the journald event loop.
Introduces journal_file_enable_post_change_timer() which manages a
timer on the provided sd-event instance for scheduling coalesced
ftruncates. The ftruncate() behavior is unchanged unless
journal_file_enable_post_change_timer() is called on the JournalFile.
While not a tremendous improvement, profiling systemd-journald event loop
latencies using instrumentation as introduced by 34b8751 it was observed that
coalescing the ftruncates was low-hanging fruit worth pursuing.
Note orders 12 and 13 shifting left into order 11 and order 6 dipping into
order 5:
Unmodified:
log2(us) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
-----------------------------------------------------------
[10685.414572] 0 0 0 0 38 602 61 2 290 60 1643 2554 13 1 4 1 0 0 1
[10690.415114] 0 0 0 0 0 646 54 7 309 44 2073 2148 17 1 3 0 0 0 1
[10695.415509] 0 0 0 0 1 650 73 3 324 37 2071 2270 9 0 0 1 0 1 0
[10700.416297] 0 0 0 0 0 659 50 4 318 38 2111 2152 6 0 1 0 0 1 1
[10705.417136] 0 0 0 0 2 660 48 4 320 38 2129 2146 12 1 1 0 0 1 1
[10710.489114] 0 0 0 0 0 673 38 3 321 37 1925 2339 7 0 0 0 0 1 1
[10715.489613] 0 0 0 0 3 656 64 8 317 48 2365 2007 7 0 0 0 0 0 1
Coalesced:
log2(us) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
-----------------------------------------------------------
[ 6169.161360] 0 0 0 1 24 786 54 11 389 24 4192 771 6 4 0 0 1 0 1
[ 6174.161705] 0 0 0 1 18 800 35 6 380 27 3977 893 3 1 0 0 1 0 1
[ 6179.162741] 0 0 0 1 28 768 51 4 391 16 3998 831 5 3 0 0 0 0 2
[ 6184.162856] 0 0 0 0 19 770 60 2 376 26 3795 1004 9 5 1 0 1 0 1
[ 6189.163279] 0 0 0 0 28 761 49 7 372 27 3729 1056 3 2 0 0 1 0 1
[ 6194.164255] 0 0 0 0 25 785 49 7 394 19 3996 908 6 3 2 0 0 0 1
[ 6199.164658] 0 0 0 0 29 797 35 5 389 18 3995 898 3 4 1 1 1 0 1
The remaining high-order delays are a result of the synchronous fsyncs in
systemd-journald, beyond the scope of this commit.
|
|
Compare errno with zero in a way that tells gcc that
(if the condition is true) errno is positive.
|
|
Also add a coccinelle receipt to help with such transitions.
|
|
Fix miscalculated buffer size and uses of size-unlimited sprintf()
|
|
function.
Not sure if this results in an exploitable buffer overflow, probably not
since the the int value is likely sanitized somewhere earlier and it's
being put through a bit mask shortly before being used.
|
|
The stream event source has a priority of SD_EVENT_PRIORITY_NORMAL+5,
and stdout source +10, but the native and syslog event sources are left
at the default of 0.
As a result, any heavy native or syslog logger can cause starvation of
the other loggers. This is trivially demonstrated by running:
dd if=/dev/urandom bs=8k | od | systemd-cat & # native spammer
systemd-run echo hello & # stream logger
journalctl --follow --output=verbose --no-pager --identifier=echo &
... and wait, and wait, the "hello" never comes.
Now kill %1, "hello" arrives finally.
|
|
Journal decompression fixes
|
|
This was the case that caused various problems that were fixed in
preceding patches, so it is good to add a test that uses it directly.
In "may_fail" test cases try again with a bigger buffer.
Instead of allocating various buffers on the stack, malloc them.
This is more reliable in case of big buffers, and allows tools like
valgrind and address sanitizer to find overflows more easily.
|
|
Add a test that LZ4_decompress_safe_partial does (not) work as
expected, so that if it starts to work at some point, we'll catch
this and adjust our code.
|
|
The header is 7 bytes, and this size was not accounted for in
total_out. This means that we could create a file that was 7 bytes
longer than requested, and the debug output was also inconsistent.
|
|
compress_blob took src, src_size, dst and *dst_size, but dst_size
wasn't used as an input parameter with the size of dst, but only as an
output parameter. dst was implicitly assumed to be at least src_size-1.
This code wasn't *wrong*, because the only real caller in
journal-file.c got it right. But it was misleading, and the tests in
test-compress.c got it wrong, and worked only because the output
buffer happened to be the same size as input buffer. So add a seperate
dst_allocated_size parameter to make it explicit what the size of the
buffer is, and to allow test to proceed with different output buffer
sizes.
|
|
lz4 has to decompress a whole "sequence" at a time. When the compressed
data is composed of a repeating pattern, the whole set of repeats has
do be docompressed, and the output buffer has to be big enough.
This is unfortunate, because potentially the slowdown is very big. We
are only interested in the field name, but we might have to decompress
the whole thing. But the full cost will be borne out only when the
full entry is a repeating pattern. In practice this shouldn't happen
(apart from tests and the like). Hopefully lz4 will be fixed to avoid
this problem, or it will grow a new function which we can use [1], so
this fix should be remporary.
[1] https://groups.google.com/d/msg/lz4c/_3kkz5N6n00/oTahzqErCgAJ
|
|
The return value was used directly in an if, so an error was treated
as success; we need to bail out instead. An error should not happen,
unless we have a compression/decompression mismatch, so output a debug
line.
|
|
destructors
|
|
We treated -ENOENT errors with silent failure, for small messages.
Do the same for large messages.
|
|
Borked since
commit 3ee897d6c2401effbc82f5eef35fce405781d6c8
Author: Lennart Poettering <lennart@poettering.net>
Date: Wed Sep 23 01:00:04 2015 +0200
tree-wide: port more code to use send_one_fd() and receive_one_fd()
because here our fd is not connected and we need to specify
the address.
|
|
Also, check the return value of all calls.
They are documented to return 0, even if journald is not listening.
|
|
|
|
We only use the public api here, so don't include
log.h.
|
|
|
|
Two unrelated fixes
|
|
Most of the function is moved to acl-util.c to make it possible to
add tests in subsequent commit.
Setting of the mode in server_fix_perms is removed:
- we either just created the file ourselves, and the permission be better right,
- or the file was already there, and we should not modify the permissions.
server_fix_perms is renamed to server_fix_acls to better reflect new
meaning, and made static because it is only used in one file.
|
|
Let's distuingish the cases where our code takes an active role in
selinux management, or just passively reports whatever selinux
properties are set.
mac_selinux_have() now checks whether selinux is around for the passive
stuff, and mac_selinux_use() for the active stuff. The latter checks the
former, plus also checks UID == 0, under the assumption that only when
we run priviliged selinux management really makes sense.
Fixes: #1941
|
|
GLIB has recently started to officially support the gcc cleanup
attribute in its public API, hence let's do the same for our APIs.
With this patch we'll define an xyz_unrefp() call for each public
xyz_unref() call, to make it easy to use inside a
__attribute__((cleanup())) expression. Then, all code is ported over to
make use of this.
The new calls are also documented in the man pages, with examples how to
use them (well, I only added docs where the _unref() call itself already
had docs, and the examples, only cover sd_bus_unrefp() and
sd_event_unrefp()).
This also renames sd_lldp_free() to sd_lldp_unref(), since that's how we
tend to call our destructors these days.
Note that this defines no public macro that wraps gcc's attribute and
makes it easier to use. While I think it's our duty in the library to
make our stuff easy to use, I figure it's not our duty to make gcc's own
features easy to use on its own. Most likely, client code which wants to
make use of this should define its own:
#define _cleanup_(function) __attribute__((cleanup(function)))
Or similar, to make the gcc feature easier to use.
Making this logic public has the benefit that we can remove three header
files whose only purpose was to define these functions internally.
See #2008.
|
|
Fix stdout stream parsing
|
|
This is a continuation of the previous include sort patch, which
only sorted for .c files.
|
|
|
|
|
|
tree-wide: group include of libudev.h with sd-*
|
|
journalctl: don't print -- No entries -- in quiet mode
|
|
|
|
|
|
use them everywhere
|
|
|
|
|
|
|
|
siphash24: let siphash24_finalize() and siphash24() return the result…
|
|
Rather than passing a pointer to return the result, return it directly
from the function calls.
Also, return the result in native endianess, and let the callers care
about the conversion. For hash tables and bloom filters, we don't care,
but in order to keep MAC addresses and DHCP client IDs stable, we
explicitly convert to LE.
|
|
Sort the includes accoding to the new coding style.
|