summaryrefslogtreecommitdiff
path: root/src/journal/compress.c
AgeCommit message (Collapse)Author
2015-12-13journal: fix reporting of output size in compres_stream_lz4Zbigniew Jędrzejewski-Szmek
The header is 7 bytes, and this size was not accounted for in total_out. This means that we could create a file that was 7 bytes longer than requested, and the debug output was also inconsistent.
2015-12-13journal: add dst_allocated_size parameter for compress_blobZbigniew Jędrzejewski-Szmek
compress_blob took src, src_size, dst and *dst_size, but dst_size wasn't used as an input parameter with the size of dst, but only as an output parameter. dst was implicitly assumed to be at least src_size-1. This code wasn't *wrong*, because the only real caller in journal-file.c got it right. But it was misleading, and the tests in test-compress.c got it wrong, and worked only because the output buffer happened to be the same size as input buffer. So add a seperate dst_allocated_size parameter to make it explicit what the size of the buffer is, and to allow test to proceed with different output buffer sizes.
2015-12-13journal: in some cases we have to decompress the full lz4 fieldZbigniew Jędrzejewski-Szmek
lz4 has to decompress a whole "sequence" at a time. When the compressed data is composed of a repeating pattern, the whole set of repeats has do be docompressed, and the output buffer has to be big enough. This is unfortunate, because potentially the slowdown is very big. We are only interested in the field name, but we might have to decompress the whole thing. But the full cost will be borne out only when the full entry is a repeating pattern. In practice this shouldn't happen (apart from tests and the like). Hopefully lz4 will be fixed to avoid this problem, or it will grow a new function which we can use [1], so this fix should be remporary. [1] https://groups.google.com/d/msg/lz4c/_3kkz5N6n00/oTahzqErCgAJ
2015-12-02lz4: fix size check which had no chance of working on big-endianZbigniew Jędrzejewski-Szmek
2015-10-27util-lib: split out allocation calls into alloc-util.[ch]Lennart Poettering
2015-10-27util-lib: move string table stuff into its own string-table.[ch]Lennart Poettering
2015-10-26util-lib: split out IO related calls to io-util.[ch]Lennart Poettering
2015-10-25Merge pull request #1654 from poettering/util-libTom Gundersen
Various changes to src/basic/
2015-10-25util-lib: split out fd-related operations into fd-util.[ch]Lennart Poettering
There are more than enough to deserve their own .c file, hence move them over.
2015-10-24util-lib: split our string related calls from util.[ch] into its own file ↵Lennart Poettering
string-util.[ch] There are more than enough calls doing string manipulations to deserve its own files, hence do something about it. This patch also sorts the #include blocks of all files that needed to be updated, according to the sorting suggestions from CODING_STYLE. Since pretty much every file needs our string manipulation functions this effectively means that most files have sorted #include blocks now. Also touches a few unrelated include files.
2015-10-24journal: irrelevant coding style fixesLennart Poettering
2015-10-14compress: return errors without logging, do not fake errnoZbigniew Jędrzejewski-Szmek
Logging for compression and decompression is assymetrical on purpose: if compiled without some type of compression, those compression code paths should never be invoked. OTOH, it is possible to encounter unsupported format on decompression, so leave those log_debug statements in, to make it easier to diagnose stuff.
2015-10-14compress: fix mmap error handlingZbigniew Jędrzejewski-Szmek
2015-10-10coredump: use lz4frame api to compress coredumpsZbigniew Jędrzejewski-Szmek
This converts the stream compression to use the new lz4frame api, compatible with lz4cat. Previous code used custom headers, so the compressed file was not compatible with lz4 command line tools. I considered this the last blocker to using lz4 by default. Speed seems to be reasonable, although a bit (a few percent) slower than the lz4 binary, even though compression is the same. I don't consider this important. It could be caused by the overhead of library calls, but is probably caused by slightly different buffer sizes or such. The code in this patch uses mmap, since since this allows the buffer to be reused while not making the code more complicated at all. In my testing, this version is noticably faster (~20%) than a naive single-buffered version. mmap can cause the program to be killed with SIGBUS, if the underlying file is truncated or a disk error occurs. We only use this from within coredump and coredumpctl, so I don't consider this an issue. Old decompression code is retained and is used if the new code fails indicating a format error. There have been reports of various smaller distributions using previous lz4 code, i.e. the old format, and it is nice to provide backwards compatibility. We can remove the legacy code in a few versions. The way that blobs are compressed in the journal is not affected.
2015-09-10tree-wide: never use the off_t unless glibc makes us use itLennart Poettering
off_t is a really weird type as it is usually 64bit these days (at least in sane programs), but could theoretically be 32bit. We don't support off_t as 32bit builds though, but still constantly deal with safely converting from off_t to other types and back for no point. Hence, never use the type anymore. Always use uint64_t instead. This has various benefits, including that we can expose these values directly as D-Bus properties, and also that the values parse the same in all cases.
2015-03-09Introduce loop_read_exact helperZbigniew Jędrzejewski-Szmek
Usually when using loop_read(), we want to read the full buffer. Add a helper that mirrors loop_write(), and returns 0 when full buffer was read, and an error otherwise. Use -ENODATA for the short read, to distinguish it from a read error.
2015-02-23remove unused includesThomas Hindoe Paaboel Andersen
This patch removes includes that are not used. The removals were found with include-what-you-use which checks if any of the symbols from a header is in use.
2015-01-22Assorted format fixesZbigniew Jędrzejewski-Szmek
Types used for pids and uids in various interfaces are unpredictable. Too bad.
2014-12-09treewide: sanitize loop_writeZbigniew Jędrzejewski-Szmek
loop_write() didn't follow the usual systemd rules and returned status partially in errno and required extensive checks from callers. Some of the callers dealt with this properly, but many did not, treating partial writes as successful. Simplify things by conforming to usual rules.
2014-08-30journal/compress: use LZ4_compress_continue()Evangelos Foutras
We can't use LZ4_compress_limitedOutput_continue() because in the worst-case scenario the compressed output can be slightly bigger than the input block. This generally affects very few blocks and is no reason to abort the compression process. I ran into this when I noticed that Chromium core dumps weren't being compressed. After switching to LZ4_compress_continue() a ~330MB Chromium core dump gets compressed to ~17M.
2014-08-03Fix misuse of uint64_t as size_tZbigniew Jędrzejewski-Szmek
They have different size on 32 bit, so they are really not interchangable.
2014-07-18compress: fix return valueZbigniew Jędrzejewski-Szmek
2014-07-11Fix build without any compression enabledZbigniew Jędrzejewski-Szmek
2014-07-08journal/compress: improve xz compression performanceJon Severinsson
The new lzma2 compression options at the top of compress_blob_xz are equivalent to using preset "0", exept for using a 1 MiB dictionary (the same as preset "1"). This makes the memory usage at most 7.5 MiB in the compressor, and 1 MiB in the decompressor, instead of the previous 92 MiB in the compressor and 8 MiB in the decompressor. According to test-compress-benchmark this commit makes XZ compression 20 times faster, with no increase in compressed data size. Using more realistic test data (an ELF binary rather than repeating ASCII letters 'a' through 'z' in order) it only provides a factor 10 speedup, and at a cost if a 10% increase in compressed data size. But that is still a worthwhile trade-off. According to test-compress-benchmark XZ compression is still 25 times slower than LZ4, but the compressed data is one eighth the size. Using more realistic test data XZ compression is only 18 times slower than LZ4, and the compressed data is only one quarter the size. $ ./test-compress-benchmark XZ: compressed & decompressed 2535300963 bytes in 42.30s (57.15MiB/s), mean compresion 99.95%, skipped 3570 bytes LZ4: compressed & decompressed 2535303543 bytes in 1.60s (1510.60MiB/s), mean compresion 99.60%, skipped 990 bytes
2014-07-06journal: add LZ4 as optional compressorZbigniew Jędrzejewski-Szmek
Add liblz4 as an optional dependency when requested with --enable-lz4, and use it in preference to liblzma for journal blob and coredump compression. To retain backwards compatibility, XZ is used to decompress old blobs. Things will function correctly only with lz4-119. Based on the benchmarks found on the web, lz4 seems to be the best choice for "quick" compressors atm. For pkg-config status, see http://code.google.com/p/lz4/issues/detail?id=135.
2014-07-06journal/compress: return early in uncompress_startswithZbigniew Jędrzejewski-Szmek
uncompress_startswith would always decode the whole stream, even if it did not start with the given prefix. Reallocation policy was also strange.
2014-06-26coredump: make compression configurableZbigniew Jędrzejewski-Szmek
Add Compression={none,xz} and CompressionLevel=0-9 settings. Defaults are xz/6. Compression=filesystem may be added later. I picked "xz" for the compression "type", since we might want to add different compressors later on. XZ is fairly memory and CPU intensive, and embedded users will likely want to use LZO or some other lightweight compression mechanism.
2014-06-26journal/compress: add stream compression/decompression functionsZbigniew Jędrzejewski-Szmek
2014-06-26journal/compress: simplify compress_blobZbigniew Jędrzejewski-Szmek
2012-11-21journal: by default do not decompress dat objects larger than 64KLennart Poettering
This introduces a new data threshold setting for sd_journal objects which controls the maximum size of objects to decompress. This is relieves the library from having to decompress full data objects even if a client program is only interested in the initial part of them. This speeds up "systemd-coredumpctl" drastically when invoked without parameters.
2012-04-12relicense to LGPLv2.1 (with exceptions)Lennart Poettering
We finally got the OK from all contributors with non-trivial commits to relicense systemd from GPL2+ to LGPL2.1+. Some udev bits continue to be GPL2+ for now, but we are looking into relicensing them too, to allow free copy/paste of all code within systemd. The bits that used to be MIT continue to be MIT. The big benefit of the relicensing is that closed source code may now link against libsystemd-login.so and friends.
2011-12-21journal: add missing compress.[ch]Lennart Poettering