summaryrefslogtreecommitdiff
path: root/src/shared
AgeCommit message (Collapse)Author
2016-12-07dissect: add DISSECT_IMAGE_DISCARD_ANY maskZbigniew Jędrzejewski-Szmek
This makes the code to set arg_flags much more readable.
2016-12-07nspawn/dissect: automatically discover dm-verity verity partitionsLennart Poettering
This adds support for discovering and making use of properly tagged dm-verity data integrity partitions. This extends both systemd-nspawn and systemd-dissect with a new --root-hash= switch that takes the root hash to use for the root partition, and is otherwise fully automatic. Verity partitions are discovered automatically by GPT table type UUIDs, as listed in https://www.freedesktop.org/wiki/Specifications/DiscoverablePartitionsSpec/ (which I updated prior to this change, to include new UUIDs for this purpose. mkosi with https://github.com/systemd/mkosi/pull/39 applied may generate images that carry the necessary integrity data. With that PR and this commit, the following simply lines suffice to boot up an integrity-protected container image: ``` # mkdir test # cd test # mkosi --verity # systemd-nspawn -i ./image.raw -bn ``` Note that mkosi writes the image file to "image.raw" next to a a file "image.roothash" that contains the root hash. systemd-nspawn will look for that file and use it if it exists, in case --root-hash= is not specified explicitly.
2016-12-07dissect: add support for encrypted imagesLennart Poettering
This adds support to the image dissector to deal with encrypted images (only LUKS). Given that we now have a neatly isolated image dissector codebase, let's add a new feature to it: support for automatically dealing with encrypted images. This is then exposed in systemd-dissect and nspawn. It's pretty basic: only support for passphrase-based encryption. In order to ensure that "systemd-dissect --mount" results in mount points whose backing LUKS DM devices are cleaned up automatically we use the DM_DEV_REMOVE ioctl() directly on the device (in DM_DEFERRED_REMOVE mode). libgcryptsetup at the moment doesn't provide a proper API for this. Thankfully, the ioctl() API is pretty easy to use.
2016-12-07dissect: add small "systemd-dissect" tool as wrapper around dissect-image.cLennart Poettering
This adds a small tool that may be used to look into OS images, and mount them to any place. This is mostly a friendlier version of test-dissect-image.c. I am not sure this should really become a proper command of systemd, hence for now do not install it into bindir, but simply libexecdir. This tool is already pretty useful since you can mount image files with it, honouring the various partitions correctly. I figure this is going to become more interesting if the dissctor learns luks and verity support.
2016-12-07util-lib: split out image dissecting code and loopback code from nspawnLennart Poettering
This adds two new APIs to systemd: - loop-util.h is a simple internal API for allocating, setting up and releasing loopback block devices. - dissect-image.h is an internal API for taking apart disk images and figuring out what the purpose of each partition is. Both APIs are basically refactored versions of similar code in nspawn. This rework should permit us to reuse this in other places than just nspawn in the future. Specifically: to implement RootImage= in the service image, similar to RootDirectory=, but operating on a disk image; to unify the gpt-auto-discovery generator code with the discovery logic in nspawn; to add new API to machined for determining the OS version of a disk image (i.e. not just running containers). This PR does not make any such changes however, it just provides the new reworked API. The reworked code is also slightly more powerful than the nspawn original one. When pointing it to an image or block device with a naked file system (i.e. no partition table) it will simply make it the root device.
2016-12-01tree-wide: stop using canonicalize_file_name(), use chase_symlinks() insteadLennart Poettering
Let's use chase_symlinks() everywhere, and stop using GNU canonicalize_file_name() everywhere. For most cases this should not change behaviour, however increase exposure of our function to get better tested. Most importantly in a few cases (most notably nspawn) it can take the correct root directory into account when chasing symlinks.
2016-11-29bus-util: add protocol error type explanationJouke Witteveen
2016-11-22Merge pull request #4692 from poettering/networkd-dhcpZbigniew Jędrzejewski-Szmek
Various networkd/DHCP fixes.
2016-11-22nspawn: add fallback top normal copy/reflink when we cannot btrfs snapshotLennart Poettering
Given that other file systems (notably: xfs) support reflinks these days, let's extend the file system snapshotting logic to fall back to plan copies or reflinks when full btrfs subvolume snapshots are not available. This essentially makes "systemd-nspawn --ephemeral" and "systemd-nspawn --template=" available on non-btrfs subvolumes. Of course, both operations will still be slower on non-btrfs than on btrfs (simply because reflinking each file individually in a directory tree is still slower than doing this in one step for a whole subvolume), but it's probably good enough for many cases, and we should provide the users with the tools, they have to figure out what's good for them. Note that "machinectl clone" already had a fallback like this in place, this patch generalizes this, and adds similar support to our other cases.
2016-11-22nspawn: add ability to run nspawn without container locks appliedLennart Poettering
This adds a new undocumented env var $SYSTEMD_NSPAWN_LOCK. When set to "0", nspawn will not attempt to lock the image. Fixes: #4037
2016-11-22shared: make sure image_path_lock() return parameters are always initialized ↵Lennart Poettering
on success We forgot to initialize the "global" return parameter in one case. Fix that.
2016-11-21seccomp: add @filesystem syscall group (#4537)Lennart Poettering
@filesystem groups various file system operations, such as opening files and directories for read/write and stat()ing them, plus renaming, deleting, symlinking, hardlinking.
2016-11-21shared: add new API to validate a string as hostname or IP addressLennart Poettering
2016-11-18Merge pull request #4538 from fbuihuu/confirm-spawn-fixesLennart Poettering
Confirm spawn fixes/enhancements
2016-11-16system-run: add support for configuring unit dependencies with --property=Lennart Poettering
Support on the server side has already been in place for quite some time, let's also add support on the client side for this.
2016-11-16core: GC redundant device jobs from the run queueLennart Poettering
In contrast to all other unit types device units when queued just track external state, they cannot effect state changes on their own. Hence unless a client or other job waits for them there's no reason to keep them in the job queue. This adds a concept of GC'ing jobs of this type as soon as no client or other job waits for them anymore. To ensure this works correctly we need to track which clients actually reference a job (i.e. which ones enqueued it). Unfortunately that's pretty nasty to do for direct connections, as sd_bus_track doesn't work for them. For now, work around this, by simply remembering in a boolean that a job was requested by a direct connection, and reset it when we notice the direct connection is gone. This means the GC logic works fine, except that jobs are not immediately removed when direct connections disconnect. In the longer term, a rework of the bus logic should fix this properly. For now this should be good enough, as GC works for fine all cases except this one, and thus is a clear improvement over the previous behaviour. Fixes: #1921
2016-11-16shared: split out code for adding multiple names to sd_bus_track objectLennart Poettering
Let's introduce a new call bus_track_add_name_many() that adds a string list to a tracking object.
2016-11-15bus-util: print RestrictNamespaces= as a stringDjalal Harouni
Allow all callers that want to print RestrictNamespaces= returned from D-Bus as a string instead of a u64 value.
2016-11-11tree-wide: make invocations of extract_first_word more uniform (#4627)Zbigniew Jędrzejewski-Szmek
extract_first_words deals fine with the string being NULL, so drop the upfront check for that.
2016-11-10nfsflags: drop useless include file 'seccomp-util.h'Franck Bui
This also fixes the build when seccomp is disabled.
2016-11-08Merge pull request #4536 from poettering/seccomp-namespacesZbigniew Jędrzejewski-Szmek
core: add new RestrictNamespaces= unit file setting Merging, not rebasing, because this touches many files and there were tree-wide cleanups in the mean time.
2016-11-08Merge pull request #4612 from keszybz/format-stringsZbigniew Jędrzejewski-Szmek
Format string tweaks (and a small fix on 32bit)
2016-11-08Merge pull request #4509 from keszybz/foreach-word-quotedMartin Pitt
Remove FOREACH_WORD_QUOTED
2016-11-07tree-wide: add PRI_[NU]SEC, and use time format strings moreZbigniew Jędrzejewski-Szmek
2016-11-07Rename formats-util.h to format-util.hZbigniew Jędrzejewski-Szmek
We don't have plural in the name of any other -util files and this inconsistency trips me up every time I try to type this file name from memory. "formats-util" is even hard to pronounce.
2016-11-05tree-wide: drop unneded WHITESPACE param to extract_first_wordZbigniew Jędrzejewski-Szmek
It's the default, and NULL is shorter.
2016-11-05Merge pull request #4578 from evverx/no-hostname-memleakRonny Chevalier
journalctl: fix memleak
2016-11-04core: add new RestrictNamespaces= unit file settingLennart Poettering
This new setting permits restricting whether namespaces may be created and managed by processes started by a unit. It installs a seccomp filter blocking certain invocations of unshare(), clone() and setns(). RestrictNamespaces=no is the default, and does not restrict namespaces in any way. RestrictNamespaces=yes takes away the ability to create or manage any kind of namspace. "RestrictNamespaces=mnt ipc" restricts the creation of namespaces so that only mount and IPC namespaces may be created/managed, but no other kind of namespaces. This setting should be improve security quite a bit as in particular user namespacing was a major source of CVEs in the kernel in the past, and is accessible to unprivileged processes. With this setting the entire attack surface may be removed for system services that do not make use of namespaces.
2016-11-03Merge pull request #4548 from keszybz/seccomp-helpZbigniew Jędrzejewski-Szmek
systemd-analyze syscall-filter
2016-11-03acl-util: fix memleakEvgeny Vereshchagin
Fixes: $ ./libtool --mode execute valgrind --leak-check=full ./journalctl >/dev/null ==22309== Memcheck, a memory error detector ==22309== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==22309== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info ==22309== Command: /home/vagrant/systemd/.libs/lt-journalctl ==22309== Hint: You are currently not seeing messages from other users and the system. Users in groups 'adm', 'systemd-journal', 'wheel' can see all messages. Pass -q to turn off this notice. ==22309== ==22309== HEAP SUMMARY: ==22309== in use at exit: 8,680 bytes in 4 blocks ==22309== total heap usage: 5,543 allocs, 5,539 frees, 9,045,618 bytes allocated ==22309== ==22309== 488 (56 direct, 432 indirect) bytes in 1 blocks are definitely lost in loss record 2 of 4 ==22309== at 0x4C2BBAD: malloc (vg_replace_malloc.c:299) ==22309== by 0x6F37A0A: __new_var_obj_p (__libobj.c:36) ==22309== by 0x6F362F7: __acl_init_obj (acl_init.c:28) ==22309== by 0x6F37731: __acl_from_xattr (__acl_from_xattr.c:54) ==22309== by 0x6F36087: acl_get_file (acl_get_file.c:69) ==22309== by 0x4F15752: acl_search_groups (acl-util.c:172) ==22309== by 0x113A1E: access_check_var_log_journal (journalctl.c:1836) ==22309== by 0x113D8D: access_check (journalctl.c:1889) ==22309== by 0x115681: main (journalctl.c:2236) ==22309== ==22309== LEAK SUMMARY: ==22309== definitely lost: 56 bytes in 1 blocks ==22309== indirectly lost: 432 bytes in 1 blocks ==22309== possibly lost: 0 bytes in 0 blocks ==22309== still reachable: 8,192 bytes in 2 blocks ==22309== suppressed: 0 bytes in 0 blocks
2016-11-03journalctl: fix memleakEvgeny Vereshchagin
bash-4.3# journalctl --no-hostname >/dev/null ================================================================= ==288==ERROR: LeakSanitizer: detected memory leaks Direct leak of 48492 byte(s) in 2694 object(s) allocated from: #0 0x7fb4aba13e60 in malloc (/lib64/libasan.so.3+0xc6e60) #1 0x7fb4ab5b2cc4 in malloc_multiply src/basic/alloc-util.h:70 #2 0x7fb4ab5b3194 in parse_field src/shared/logs-show.c:98 #3 0x7fb4ab5b4918 in output_short src/shared/logs-show.c:347 #4 0x7fb4ab5b7cb7 in output_journal src/shared/logs-show.c:977 #5 0x5650e29cd83d in main src/journal/journalctl.c:2581 #6 0x7fb4aabdb730 in __libc_start_main (/lib64/libc.so.6+0x20730) SUMMARY: AddressSanitizer: 48492 byte(s) leaked in 2694 allocation(s). Closes: #4568
2016-11-03Merge pull request #4510 from keszybz/tree-wide-cleanupsLennart Poettering
Tree wide cleanups
2016-11-03seccomp-util, analyze: export comments as a help stringZbigniew Jędrzejewski-Szmek
Just to make the whole thing easier for users.
2016-11-03seccomp-util: move @default to the first positionZbigniew Jędrzejewski-Szmek
Now that the list is user-visible, @default should be first.
2016-11-02Do not raise in switch root if paths are too longZbigniew Jędrzejewski-Szmek
If we encounter the (unlikely) situation where the combined path to the new root and a path to a mount to be moved together exceed maximum path length, we shouldn't crash, but fail this path instead.
2016-11-02seccomp: add two new syscall groupsLennart Poettering
@resources contains various syscalls that alter resource limits and memory and scheduling parameters of processes. As such they are good candidates to block for most services. @basic-io contains a number of basic syscalls for I/O, similar to the list seccomp v1 permitted but slightly more complete. It should be useful for building basic whitelisting for minimal sandboxes
2016-11-02seccomp: include pipes and memfd in @ipcLennart Poettering
These system calls clearly fall in the @ipc category, hence should be listed there, simply to avoid confusion and surprise by the user.
2016-11-02seccomp: drop execve() from @process listLennart Poettering
The system call is already part in @default hence implicitly allowed anyway. Also, if it is actually blocked then systemd couldn't execute the service in question anymore, since the application of seccomp is immediately followed by it.
2016-11-02seccomp: add clock query and sleeping syscalls to "@default" groupLennart Poettering
Timing and sleep are so basic operations, it makes very little sense to ever block them, hence don't.
2016-11-01seccomp: allow specifying arm64, mips, ppc (#4491)Zbigniew Jędrzejewski-Szmek
"Secondary arch" table for mips is entirely speculative…
2016-10-27Merge pull request #4442 from keszybz/detect-virt-usernsEvgeny Vereshchagin
detect-virt: add --private-users switch to check if a userns is active; add Condition=private-users
2016-10-26condition: simplify condition_test_virtualizationZbigniew Jędrzejewski-Szmek
Rewrite the function to be slightly simpler. In particular, if a specific match is found (like ConditionVirtualization=yes), simply return an answer immediately, instead of relying that "yes" will not be matched by any of the virtualization names below. No functional change.
2016-10-26shared/condition: add ConditionVirtualization=[!]private-usersZbigniew Jędrzejewski-Szmek
This can be useful to silence warnings about units which fail in userns container.
2016-10-24seccomp: add test-seccomp test toolLennart Poettering
This validates the system call set table and many of our seccomp-util.c APIs.
2016-10-24seccomp: add new helper call seccomp_load_filter_set()Lennart Poettering
This allows us to unify most of the code in apply_protect_kernel_modules() and apply_private_devices().
2016-10-24seccomp: two fixes for the syscall set tablesLennart Poettering
"oldumount()" is not a syscall, but simply a wrapper for it, the actual syscall nr is called "umount" (and the nr of umount() is called umount2 internally). "sysctl()" is not a syscall, but "_syscall()" is. Fix this in the table. Without these changes libseccomp cannot actually translate the tables in full. This wasn't noticed before as the code was written defensively for this case.
2016-10-24seccomp: add new seccomp_init_conservative() helperLennart Poettering
This adds a new seccomp_init_conservative() helper call that is mostly just a wrapper around seccomp_init(), but turns off NNP and adds in all secondary archs, for best compatibility with everything else. Pretty much all of our code used the very same constructs for these three steps, hence unifying this in one small function makes things a lot shorter. This also changes incorrect usage of the "scmp_filter_ctx" type at various places. libseccomp defines it as typedef to "void*", i.e. it is a pointer type (pretty poor choice already!) that casts implicitly to and from all other pointer types (even poorer choice: you defined a confusing type now, and don't even gain any bit of type safety through it...). A lot of the code assumed the type would refer to a structure, and hence aded additional "*" here and there. Remove that.
2016-10-24core: rework syscall filter set handlingLennart Poettering
A variety of fixes: - rename the SystemCallFilterSet structure to SyscallFilterSet. So far the main instance of it (the syscall_filter_sets[] array) used to abbreviate "SystemCall" as "Syscall". Let's stick to one of the two syntaxes, and not mix and match too wildly. Let's pick the shorter name in this case, as it is sufficiently well established to not confuse hackers reading this. - Export explicit indexes into the syscall_filter_sets[] array via an enum. This way, code that wants to make use of a specific filter set, can index it directly via the enum, instead of having to search for it. This makes apply_private_devices() in particular a lot simpler. - Provide two new helper calls in seccomp-util.c: syscall_filter_set_find() to find a set by its name, seccomp_add_syscall_filter_set() to add a set to a seccomp object. - Update SystemCallFilter= parser to use extract_first_word(). Let's work on deprecating FOREACH_WORD_QUOTED(). - Simplify apply_private_devices() using this functionality
2016-10-24update-done: minor clean-upsLennart Poettering
This is a follow-up for fb8b0869a7bc30e23be175cf978df23192d59118, and makes a couple of minor clean-up changes: - The field name in the timestamp file is changed from "TimestampNSec=" to "TIMESTAMP_NSEC=". This is done simply to reflect the fact that we parse the file with the env var file parser, and hence the contents should better follow the usual capitalization of env vars, i.e. be all uppercase. - Needless negation of the errno parameter log_error_errno() and friends has been removed. - Instead of manually calculating the nsec remainder of the timestamp, use timespec_store(). - We now check whether we were able to write the timestamp file in full with fflush_and_check() the way we usually do it.
2016-10-24shared, systemctl: teach is-enabled to show installation targetsJan Synacek
It may be desired by users to know what targets a particular service is installed into. Improve user friendliness by teaching the is-enabled command to show such information when used with --full. This patch makes use of the newly added UnitFileFlags and adds UNIT_FILE_DRY_RUN flag into it. Since the API had already been modified, it's now easy to add the new dry-run feature for other commands as well. As a next step, --dry-run could be added to systemctl, which in turn might pave the way for a long requested dry-run feature when running systemctl start.