Age | Commit message (Collapse) | Author |
|
|
|
|
|
Even though we use fallocate() it appears that file systems like btrfs
will trigger SIGBUS on certain low-disk-space situation. We should
handle that, hence catch the signal, add it to a list of invalidated
pages, and replace the page with an empty memory area. After each write
check if SIGBUS was triggered, and consider the write invalid if it was.
This should make journald a lot more robust with file systems where
fallocate() is not reliable, for example all CoW file systems
(btrfs...), where changing written data can fail with disk full errors.
https://bugzilla.redhat.com/show_bug.cgi?id=1045810
|
|
https://github.com/vlajos/misspell_fixer
https://github.com/torstehu/systemd/commit/b6fdeb618cf2f3ce1645b3315f15f482710c7ffa
Thanks to Torstein Husebo <torstein@huseboe.net>.
|
|
If OBJECT_PID= came as the last field, we would not reallocate the iovec to bigger size,
and fail the assertion later on in dispatch_message_real().
|
|
https://bugzilla.redhat.com/show_bug.cgi?id=1177184
|
|
|
|
There is alot of cleanup that will have to happen to turn on
-fstrict-aliasing, but I think our code should be "correct" to the rule.
|
|
EOF is meaningless if the direction of iteration changes.
Move the EOF optimization under the direction check.
This fixes test-journal-interleaving for me.
Thanks to Filipe Brandenburger for telling me about the failure.
|
|
next_with_matches() is odd in that its "unit64_t *offset" parameter is
both input and output. In other it's purely for output.
The function is called from two places in next_beyond_location(). In
both of them "&cp" is used as the argument and in both cases cp is
guaranteed to equal f->current_offset.
Let's just have next_with_matches() ignore "*offset" on input and
operate with f->current_offset.
I did not investigate why it is, but it makes my usual benchmark run
reproducibly faster:
$ time ./journalctl --since=2014-06-01 --until=2014-07-01 > /dev/null
real 0m4.032s
user 0m3.896s
sys 0m0.135s
(Compare to preceding commit, where real was 4.4s.)
|
|
I accidentally broke the detection of duplicate entries in 7943f42275
"journal: optimize iteration by returning previously found candidate
entry".
When we have a known location of a candidate entry, we must not return
from next_beyond_location() immediately. We must go through the
duplicates detection to make sure the candidate differs from the
already iterated entry.
This fix slows down iteration a bit, but it's still faster than it
was before the rework.
$ time ./journalctl --since=2014-06-01 --until=2014-07-01 > /dev/null
real 0m4.448s
user 0m4.298s
sys 0m0.149s
(Compare with results from commit 7943f42275, where real was 5.3s before
the rework.)
|
|
Now that journal_file_next_entry() does not need a pointer to the
current object, next_with_matches() does not need it either.
|
|
The current offset is sufficient information.
|
|
In next_beyond_location() when the JournalFile's location type is
LOCATION_SEEK, it means there's nothing to do, because we already have
the location of the candidate entry. Do an early return. Note that now
next_beyond_location() does not anymore guarantee on return that the
entry is mapped, but previous patches made sure the caller does not
care.
This optimization is at least as good as "journal: optimize iteration:
skip files that cannot improve current candidate entry" was.
Timing results on my workstation, using:
$ time ./journalctl -q --since=2014-06-01 --until=2014-07-01 > /dev/null
Before "Revert "journal: optimize iteration: skip files that cannot
improve current candidate entry":
real 0m5.349s
user 0m5.166s
sys 0m0.181s
Now:
real 0m3.901s
user 0m3.724s
sys 0m0.176s
|
|
If from a previous iteration we know we are at the end of a journal
file, don't bother looking into the file again. This is complicated by
the fact that the EOF does not have to be permanent (think of
"journalctl -f"). So we also check if the number of entries in the
journal file changed.
This optimization has a similar effect as "journal: optimize iteration:
skip whole files behind current location" had.
|
|
offset is redundant, because the caller can rely on f->current_offset.
The object pointer the function saves in *ret is thrown away by the caller.
|
|
The file's current_offset is already updated at this point, so let's use
it.
|
|
When comparing the locations of candidate entries, we can rely on the
location information stored in struct JournalFile.
|
|
set_location() is called from real_journal_next() when a winning entry
has been picked from among the candidates in journal files.
The location type is always set to LOCATION_DISCRETE. No need to pass
it as a parameter.
The per-JournalFile location information is already updated at this
point. No need for having the direction and offset here.
|
|
In next_beyond_location() when we find a candidate entry in a journal
file, save its location information in struct JournalFile.
The purpose of remembering the locations of candidate entries is to be
able to save work in the next iteration. This patch does only the
remembering part.
LOCATION_SEEK means the location identifies a candidate entry.
When a winner is picked from among candidates, it becomes
LOCATION_DISCRETE.
LOCATION_TAIL here signifies we've iterated the file to the end (or the
beginning in the case of reversed direction).
|
|
|
|
In preparation for individual JournalFiles maintaining a location
of their own.
|
|
This reverts commit b7c88ab8cc7d55a43450bf3dea750f95f2e910d6.
This optimization will be made redundant by the following patches.
|
|
candidate entry"
This reverts commit f8b5a3b75fb55f0acb85c21424b3893c822742e9.
This optimization will be made redundant by the following patches.
|
|
Its only caller is a test.
|
|
|
|
try_context() is such a hot path that the hashmap lookup is expensive.
The number of contexts is small - it is the number of object types.
Using a hashmap is overkill. A plain array will do.
Before:
$ time ./journalctl --since=2014-06-01 --until=2014-07-01 > /dev/null
real 0m9.445s
user 0m9.228s
sys 0m0.213s
After:
$ time ./journalctl --since=2014-06-01 --until=2014-07-01 > /dev/null
real 0m5.438s
user 0m5.266s
sys 0m0.170s
|
|
This never had any callers. Contexts are freed when the MMapCache is
freed.
|
|
|
|
|
|
Note that numbers 0 and -1 are both replaced with OBJECT_UNUSED,
because they are treated the same everywhere (e.g. type_to_context()
translates them both to 0).
|
|
If type==0 and a non-NULL object were given as arguments to
journal_file_hmac_put_object(), its object type check would fail and it
would return -EBADMSG.
All existing callers use either a positive type or -1. Still, for
behavior consistency with journal_file_move_to_object() let's allow
type 0 to pass.
|
|
It has no other callers. It does not need to be in the header file.
|
|
The only user is sd_journal_enumerate_unique() and, as explained in
the previous commit (fed67c38e3 "journal: map objects to context set by
caller, not by actual object type"), the use of them there is now
superfluous. Let's remove them.
This reverts major parts of commits:
ae97089d49 journal: fix access to munmapped memory in
sd_journal_enumerate_unique
06cc69d44c sd-journal: fix sd_journal_enumerate_unique skipping values
Tested with an "--enable-debug" build and "journalctl --list-boots".
It gives the expected number of results. Additionally, if I then revert
the previous commit ("journal: map objects to context set by caller, not
to actual object type"), it crashes with SIGSEGV, as expected.
|
|
When the caller of journal_file_move_to_object() specifies type==0,
the object header is at first mapped in context 0. Then after the header
is checked, the whole object is mapped in a context determined by
the actual object type (which is not even range-checked using
type_to_context()). This looks wrong. It should map in the
caller-specified context.
An old comment in sd_journal_enumerate_unique() supports this view:
/* We do not use the type context here, but 0 instead,
* so that we can look at this data object at the same
* time as one on another file */
Clearly the expectation was that the data object will remain mapped
in context 0 without being pushed away by mapping other objects in
context OBJECT_DATA.
I suspect that this was the real bug that got fixed by ae97089d49
"journal: fix access to munmapped memory in sd_journal_enumerate_unique".
In other words, journal_file_object_keep/release are superfluous after
applying this patch.
|
|
This is useful for exposing unsafe access to mmapped objects after
the context that they were mapped in was already moved.
For example:
journal_file_move_to_object(f1, OBJECT_DATA, p1, &o1);
journal_file_move_to_object(f2, OBJECT_DATA, p2, &o2);
t = o1->object.type; /* this usually works, but is unsafe */
|
|
|
|
resulting name is actually valid
Also, rename filename_is_safe() to filename_is_valid(), since it
actually does a full validation for what the kernel will accept as file
name, it's not just a heuristic.
|
|
|
|
loop_write() didn't follow the usual systemd rules and returned status
partially in errno and required extensive checks from callers. Some of
the callers dealt with this properly, but many did not, treating
partial writes as successful. Simplify things by conforming to usual rules.
|
|
Let's add some syntactic sugar for iterating through inotify events, and
use it everywhere.
|
|
candidate entry
Suppose that while iterating we have already looked into a journal file
and got a candidate for the next entry. And we are considering to look
into another journal file because it may contain an entry that is nearer
to the current location than the candidate.
We should skip the whole journal file if we can tell by looking at its
header that none of its entries can precede the candidate.
Before:
$ time ./journalctl --since=2014-06-01 --until=2014-07-01 > /dev/null
real 0m20.518s
user 0m19.989s
sys 0m0.328s
After:
$ time ./journalctl --since=2014-06-01 --until=2014-07-01 > /dev/null
real 0m9.445s
user 0m9.228s
sys 0m0.213s
|
|
Interleaving of entries from many journal files is expensive. But there
is room for optimization.
We can skip looking into journal files whose entries all lie before the
current iterating location. We can tell if that's the case from looking
at the journal file header. This saves a huge amount of work if one has
many of mostly not interleaved journal files.
On my workstation with 90 journal files in /var/log/journal/ID/
totalling 3.4 GB I get these results:
Before:
$ time ./journalctl --since=2014-06-01 --until=2014-07-01 > /dev/null
real 5m54.258s
user 2m4.263s
sys 3m48.965s
After:
$ time ./journalctl --since=2014-06-01 --until=2014-07-01 > /dev/null
real 0m20.518s
user 0m19.989s
sys 0m0.328s
The high "sys" time in the original was caused by putting more stress on
the mmap-cache than it could handle. With the patch the working set
now consists of fewer mmap windows and mmap-cache is not thrashing.
|
|
In the case where no entries have been added to the journal after the specified
cursor, set need_seek before the main loop to prevent display of the entry at
said cursor.
|
|
With DIRECTION_UP (i.e. navigating backwards) in generic_array_bisect() when the
needle was found as the last item in the array, it wasn't actually processed as
match, resulting in entries being missed.
https://bugs.freedesktop.org/show_bug.cgi?id=86855
|
|
This is mostly likely the audit socket, and we really should close it
if we cannot make sense of it, since as long as it is open the kernel
might disable the kmsg forwarding of audit msgs, and we should avoid
that, since audit msgs might get completely lost then.
I also downgraded the log message we show a bit, after all things should
really work fine, and we proceed fine with it.
|
|
Otherwise they can be optimized away with -DNDEBUG
|
|
|
|
|
|
rather than heap
|