Age | Commit message (Collapse) | Author |
|
|
|
fd_setcrtime()
|
|
btrfs' COW logic results in heavily fragment journal files, which is
detrimental for perfomance. Hence, turn off COW for journal files as we
create them.
Turning off COW comes at the cost of data integrity guarantees, but this
should be acceptable, given that we do our own checksumming, and
generally have a pretty conservative write pattern.
Also see discussion on linux-btrfs:
http://www.spinics.net/lists/linux-btrfs/msg41001.html
|
|
|
|
Our write pattern is quite awful for CoW file systems (btrfs...), as we
keep updating file parts in the beginning of the file. This results in
fragmented journal files. Hence: when rotating files, defragment them,
since at that point we know that no further write accesses will be made.
|
|
deleted, rotate
https://bugzilla.redhat.com/show_bug.cgi?id=1171719
|
|
journal file headers
Since the file headers might be replaced by zeroed pages now due to
sigbus we should make sure we don't end up dividing by zero because we
don't check values read from journal file headers for changes.
|
|
Even though we use fallocate() it appears that file systems like btrfs
will trigger SIGBUS on certain low-disk-space situation. We should
handle that, hence catch the signal, add it to a list of invalidated
pages, and replace the page with an empty memory area. After each write
check if SIGBUS was triggered, and consider the write invalid if it was.
This should make journald a lot more robust with file systems where
fallocate() is not reliable, for example all CoW file systems
(btrfs...), where changing written data can fail with disk full errors.
https://bugzilla.redhat.com/show_bug.cgi?id=1045810
|
|
|
|
The current offset is sufficient information.
|
|
When comparing the locations of candidate entries, we can rely on the
location information stored in struct JournalFile.
|
|
In next_beyond_location() when we find a candidate entry in a journal
file, save its location information in struct JournalFile.
The purpose of remembering the locations of candidate entries is to be
able to save work in the next iteration. This patch does only the
remembering part.
LOCATION_SEEK means the location identifies a candidate entry.
When a winner is picked from among candidates, it becomes
LOCATION_DISCRETE.
LOCATION_TAIL here signifies we've iterated the file to the end (or the
beginning in the case of reversed direction).
|
|
|
|
Its only caller is a test.
|
|
|
|
try_context() is such a hot path that the hashmap lookup is expensive.
The number of contexts is small - it is the number of object types.
Using a hashmap is overkill. A plain array will do.
Before:
$ time ./journalctl --since=2014-06-01 --until=2014-07-01 > /dev/null
real 0m9.445s
user 0m9.228s
sys 0m0.213s
After:
$ time ./journalctl --since=2014-06-01 --until=2014-07-01 > /dev/null
real 0m5.438s
user 0m5.266s
sys 0m0.170s
|
|
|
|
|
|
Note that numbers 0 and -1 are both replaced with OBJECT_UNUSED,
because they are treated the same everywhere (e.g. type_to_context()
translates them both to 0).
|
|
It has no other callers. It does not need to be in the header file.
|
|
The only user is sd_journal_enumerate_unique() and, as explained in
the previous commit (fed67c38e3 "journal: map objects to context set by
caller, not by actual object type"), the use of them there is now
superfluous. Let's remove them.
This reverts major parts of commits:
ae97089d49 journal: fix access to munmapped memory in
sd_journal_enumerate_unique
06cc69d44c sd-journal: fix sd_journal_enumerate_unique skipping values
Tested with an "--enable-debug" build and "journalctl --list-boots".
It gives the expected number of results. Additionally, if I then revert
the previous commit ("journal: map objects to context set by caller, not
to actual object type"), it crashes with SIGSEGV, as expected.
|
|
When the caller of journal_file_move_to_object() specifies type==0,
the object header is at first mapped in context 0. Then after the header
is checked, the whole object is mapped in a context determined by
the actual object type (which is not even range-checked using
type_to_context()). This looks wrong. It should map in the
caller-specified context.
An old comment in sd_journal_enumerate_unique() supports this view:
/* We do not use the type context here, but 0 instead,
* so that we can look at this data object at the same
* time as one on another file */
Clearly the expectation was that the data object will remain mapped
in context 0 without being pushed away by mapping other objects in
context OBJECT_DATA.
I suspect that this was the real bug that got fixed by ae97089d49
"journal: fix access to munmapped memory in sd_journal_enumerate_unique".
In other words, journal_file_object_keep/release are superfluous after
applying this patch.
|
|
|
|
With DIRECTION_UP (i.e. navigating backwards) in generic_array_bisect() when the
needle was found as the last item in the array, it wasn't actually processed as
match, resulting in entries being missed.
https://bugs.freedesktop.org/show_bug.cgi?id=86855
|
|
If the format string contains %m, clearly errno must have a meaningful
value, so we might as well use log_*_errno to have ERRNO= logged.
Using:
find . -name '*.[ch]' | xargs sed -r -i -e \
's/log_(debug|info|notice|warning|error|emergency)\((".*%m.*")/log_\1_errno(errno, \2/'
Plus some whitespace, linewrap, and indent adjustments.
|
|
coverity otherwise assumes that the chain object might be NULL.
|
|
The order of entries may matter here. Oldest entries are evicted first
when the cache is full.
(Though I don't see anything to rejuvenate entries on cache hits.)
|
|
sd_journal_enumerate_unique will lock its mmap window to prevent it
from being released by calling mmap_cache_get with keep_always=true.
This call may return windows that are wider, but compatible with the
parameters provided to it.
This can result in a mismatch where the window to be released cannot
properly be selected, because we have more than one window matching the
parameters of mmap_cache_release. Therefore, introduce a release_cookie
to be used when releasing the window.
https://bugs.freedesktop.org/show_bug.cgi?id=79380
|
|
It is redundant to store 'hash' and 'compare' function pointers in
struct Hashmap separately. The functions always comprise a pair.
Store a single pointer to struct hash_ops instead.
systemd keeps hundreds of hashmaps, so this saves a little bit of
memory.
|
|
If the journal is corrupted, we might return an object that does
not start with the expected field name and/or is shorter than it
should.
|
|
They have different size on 32 bit, so they are really not interchangable.
|
|
If a file was opened for writing, and then closed immediately without
actually writing any entries, on subsequent opening, it would be
considered "corrupted". This should be totally fine, and even in
read mode, an empty file can become non-empty later on.
|
|
|
|
|
|
Add liblz4 as an optional dependency when requested with --enable-lz4,
and use it in preference to liblzma for journal blob and coredump
compression. To retain backwards compatibility, XZ is used to
decompress old blobs.
Things will function correctly only with lz4-119.
Based on the benchmarks found on the web, lz4 seems to be the best
choice for "quick" compressors atm.
For pkg-config status, see http://code.google.com/p/lz4/issues/detail?id=135.
|
|
|
|
safe_close() automatically becomes a NOP when a negative fd is passed,
and returns -1 unconditionally. This makes it easy to write lines like
this:
fd = safe_close(fd);
Which will close an fd if it is open, and reset the fd variable
correctly.
By making use of this new scheme we can drop a > 200 lines of code that
was required to test for non-negative fds or to reset the closed fd
variable afterwards.
|
|
With a corrupted file, we can get in a situation where two entries
in the entry array point to the same object. Then journal_file_next_entry
will find the first one using generic_arrray_bisect, and try to move to
the second one, but since the address is the same, generic_array_get will
return the first one. journal_file_next_entry ends up in an infinite loop.
https://bugzilla.redhat.com/show_bug.cgi?id=1047039
|
|
As pointed-out by clang -Wunreachable-code.
No behaviour changes.
|
|
gcc (4.8.2, arm) does not understand that journal_file_append_field()
will always set 'fo' when it returns 0, so this warning is bogus.
Anyway, fix it by initialiting fo = NULL.
|
|
In trying to track down a stupid linker bug, I noticed a bunch of
memset() calls that should be using memzero() to make it more "obvious"
that the options are correct (i.e. 0 is not the length, but the data to
set). So fix up all current calls to memset(foo, 0, length) to
memzero(foo, length).
|
|
sd_j_e_u needs to keep a reference to an object while comparing it
with possibly duplicate objects in other files. Because the size of
mmap cache is limited, with enough files and object to compare to,
at some point the object being compared would be munmapped, resulting
in a segmentation fault.
Fix this issue by turning keep_always into a reference count that can
be increased and decreased. Other callers which set keep_always=true
are unmodified: their references are never released but are ignored
when the whole file is closed, which happens at some point. keep_always
is increased in sd_j_e_u and later on released.
|
|
Convert entry_array.items[0] to host byte order prior to passing it to
chain_cache_put().
[zj: also use le64toh in journal-verify.c]
https://bugs.freedesktop.org/show_bug.cgi?id=73194
|
|
SipHash appears to be the new gold standard for hashing smaller strings
for hashtables these days, so let's make use of it.
|
|
we also do 'last_index = (uint64_t) -1;' at the end of the while
loop so there is no reason to also do it here.
|
|
While all the libc implementations I know return NULL when memchr's size
parameter is 0, without accessing any memory, passing NULL to memchr is
still invalid:
C11 7.24.1p2: Where an argument declared as "size_t n" specifies the length
of the array for a function, n can have the value zero on a call to that
function. Unless explicitly stated otherwise in the description of a
particular function in this subclause, pointer arguments on such a call
shall still have valid values, as described in 7.1.4. On such a call, a
function that locates a character finds no occurrence, a function that
compares two character sequences returns zero, and a function that copies
characters copies zero characters.
see http://llvm.org/bugs/show_bug.cgi?id=18247
|
|
|
|
let's just do a single fallocate() as far as possible, and don't
distuingish between allocated space and file size.
This way we can save a syscall for each append, which makes quite some
benefits.
|
|
|
|
chain element
|