Age | Commit message (Collapse) | Author |
|
Remove the old version of the lz4 stream compressor
|
|
The header is 7 bytes, and this size was not accounted for in
total_out. This means that we could create a file that was 7 bytes
longer than requested, and the debug output was also inconsistent.
|
|
compress_blob took src, src_size, dst and *dst_size, but dst_size
wasn't used as an input parameter with the size of dst, but only as an
output parameter. dst was implicitly assumed to be at least src_size-1.
This code wasn't *wrong*, because the only real caller in
journal-file.c got it right. But it was misleading, and the tests in
test-compress.c got it wrong, and worked only because the output
buffer happened to be the same size as input buffer. So add a seperate
dst_allocated_size parameter to make it explicit what the size of the
buffer is, and to allow test to proceed with different output buffer
sizes.
|
|
lz4 has to decompress a whole "sequence" at a time. When the compressed
data is composed of a repeating pattern, the whole set of repeats has
do be docompressed, and the output buffer has to be big enough.
This is unfortunate, because potentially the slowdown is very big. We
are only interested in the field name, but we might have to decompress
the whole thing. But the full cost will be borne out only when the
full entry is a repeating pattern. In practice this shouldn't happen
(apart from tests and the like). Hopefully lz4 will be fixed to avoid
this problem, or it will grow a new function which we can use [1], so
this fix should be remporary.
[1] https://groups.google.com/d/msg/lz4c/_3kkz5N6n00/oTahzqErCgAJ
|
|
|
|
|
|
|
|
|
|
Various changes to src/basic/
|
|
There are more than enough to deserve their own .c file, hence move them
over.
|
|
string-util.[ch]
There are more than enough calls doing string manipulations to deserve
its own files, hence do something about it.
This patch also sorts the #include blocks of all files that needed to be
updated, according to the sorting suggestions from CODING_STYLE. Since
pretty much every file needs our string manipulation functions this
effectively means that most files have sorted #include blocks now.
Also touches a few unrelated include files.
|
|
|
|
This was the original lz4 file header, custom in systemd, that was
not compatible with the lz4 binary. It was not compiled in by default,
and was only used for coredumps stored as files on disk. It is safe to
remove it after a transition period in which coredumps have been
rotated.
|
|
Logging for compression and decompression is assymetrical on purpose:
if compiled without some type of compression, those compression code
paths should never be invoked. OTOH, it is possible to encounter
unsupported format on decompression, so leave those log_debug statements
in, to make it easier to diagnose stuff.
|
|
|
|
This converts the stream compression to use the new lz4frame api,
compatible with lz4cat. Previous code used custom headers, so the
compressed file was not compatible with lz4 command line tools.
I considered this the last blocker to using lz4 by default.
Speed seems to be reasonable, although a bit (a few percent) slower
than the lz4 binary, even though compression is the same. I don't
consider this important. It could be caused by the overhead of library
calls, but is probably caused by slightly different buffer sizes or
such. The code in this patch uses mmap, since since this allows the
buffer to be reused while not making the code more complicated at all.
In my testing, this version is noticably faster (~20%) than a naive
single-buffered version. mmap can cause the program to be killed with
SIGBUS, if the underlying file is truncated or a disk error occurs. We
only use this from within coredump and coredumpctl, so I don't
consider this an issue.
Old decompression code is retained and is used if the new code fails
indicating a format error. There have been reports of various smaller
distributions using previous lz4 code, i.e. the old format, and it is
nice to provide backwards compatibility. We can remove the legacy code
in a few versions.
The way that blobs are compressed in the journal is not affected.
|
|
off_t is a really weird type as it is usually 64bit these days (at least
in sane programs), but could theoretically be 32bit. We don't support
off_t as 32bit builds though, but still constantly deal with safely
converting from off_t to other types and back for no point.
Hence, never use the type anymore. Always use uint64_t instead. This has
various benefits, including that we can expose these values directly as
D-Bus properties, and also that the values parse the same in all cases.
|
|
Usually when using loop_read(), we want to read the full buffer.
Add a helper that mirrors loop_write(), and returns 0 when full buffer
was read, and an error otherwise.
Use -ENODATA for the short read, to distinguish it from a read error.
|
|
This patch removes includes that are not used. The removals were found with
include-what-you-use which checks if any of the symbols from a header is
in use.
|
|
Types used for pids and uids in various interfaces are unpredictable.
Too bad.
|
|
loop_write() didn't follow the usual systemd rules and returned status
partially in errno and required extensive checks from callers. Some of
the callers dealt with this properly, but many did not, treating
partial writes as successful. Simplify things by conforming to usual rules.
|
|
We can't use LZ4_compress_limitedOutput_continue() because in the
worst-case scenario the compressed output can be slightly bigger than
the input block. This generally affects very few blocks and is no reason
to abort the compression process.
I ran into this when I noticed that Chromium core dumps weren't being
compressed. After switching to LZ4_compress_continue() a ~330MB Chromium
core dump gets compressed to ~17M.
|
|
They have different size on 32 bit, so they are really not interchangable.
|
|
|
|
|
|
The new lzma2 compression options at the top of compress_blob_xz are
equivalent to using preset "0", exept for using a 1 MiB dictionary
(the same as preset "1"). This makes the memory usage at most 7.5 MiB
in the compressor, and 1 MiB in the decompressor, instead of the
previous 92 MiB in the compressor and 8 MiB in the decompressor.
According to test-compress-benchmark this commit makes XZ compression
20 times faster, with no increase in compressed data size.
Using more realistic test data (an ELF binary rather than repeating
ASCII letters 'a' through 'z' in order) it only provides a factor 10
speedup, and at a cost if a 10% increase in compressed data size.
But that is still a worthwhile trade-off.
According to test-compress-benchmark XZ compression is still 25 times
slower than LZ4, but the compressed data is one eighth the size.
Using more realistic test data XZ compression is only 18 times slower
than LZ4, and the compressed data is only one quarter the size.
$ ./test-compress-benchmark
XZ: compressed & decompressed 2535300963 bytes in 42.30s (57.15MiB/s), mean compresion 99.95%, skipped 3570 bytes
LZ4: compressed & decompressed 2535303543 bytes in 1.60s (1510.60MiB/s), mean compresion 99.60%, skipped 990 bytes
|
|
Add liblz4 as an optional dependency when requested with --enable-lz4,
and use it in preference to liblzma for journal blob and coredump
compression. To retain backwards compatibility, XZ is used to
decompress old blobs.
Things will function correctly only with lz4-119.
Based on the benchmarks found on the web, lz4 seems to be the best
choice for "quick" compressors atm.
For pkg-config status, see http://code.google.com/p/lz4/issues/detail?id=135.
|
|
uncompress_startswith would always decode the whole stream, even
if it did not start with the given prefix.
Reallocation policy was also strange.
|
|
Add Compression={none,xz} and CompressionLevel=0-9 settings. Defaults
are xz/6.
Compression=filesystem may be added later.
I picked "xz" for the compression "type", since we might want to add
different compressors later on. XZ is fairly memory and CPU intensive, and
embedded users will likely want to use LZO or some other lightweight compression
mechanism.
|
|
|
|
|
|
This introduces a new data threshold setting for sd_journal objects
which controls the maximum size of objects to decompress. This is
relieves the library from having to decompress full data objects even
if a client program is only interested in the initial part of them.
This speeds up "systemd-coredumpctl" drastically when invoked without
parameters.
|
|
We finally got the OK from all contributors with non-trivial commits to
relicense systemd from GPL2+ to LGPL2.1+.
Some udev bits continue to be GPL2+ for now, but we are looking into
relicensing them too, to allow free copy/paste of all code within
systemd.
The bits that used to be MIT continue to be MIT.
The big benefit of the relicensing is that closed source code may now
link against libsystemd-login.so and friends.
|
|
|