summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-09-28coredumpctl: tighten print_field() codeZbigniew Jędrzejewski-Szmek
Propagate errors properly, so that if we hit oom or an error in the journal, the whole command will fail. This is important when using the output in scripts. Support the output of multiple values for the same field with -F. The journal supports that, and our official commands should too, as far as it makes sense. -F can be used to print user-defined fields (e.g. somebody could use a TAG field with multiple occurences), so we should support that too. That seems better than silently printing the last value found as was done before. We would iterate trying to match the same field with all possible field names. Once we find something, cut the loop short, since we know that nothing else can match.
2016-09-28coredumpctl: rework presence reportingZbigniew Jędrzejewski-Szmek
The column for "present" was easy to miss, especially if somebody had no coredumps present at all, in which case the column of spaces of width one wasn't visually distinguished from the neighbouring columns. Replace this with an explicit text, one of: "missing", "journal", "present", "error". $ coredumpctl TIME PID UID GID SIG COREFILE EXE Mon 2016-09-26 22:46:31 CEST 8623 0 0 11 missing /usr/bin/bash Mon 2016-09-26 22:46:35 CEST 8639 1001 1001 11 missing /usr/bin/bash Tue 2016-09-27 01:10:46 CEST 16110 1001 1001 11 journal /usr/bin/bash Tue 2016-09-27 01:13:20 CEST 16290 1001 1001 11 journal /usr/bin/bash Tue 2016-09-27 01:33:48 CEST 17867 1001 1001 11 present /usr/bin/bash Tue 2016-09-27 01:37:55 CEST 18549 0 0 11 error /usr/bin/bash Also, use access(…, R_OK), so that we can report a present but inaccessible file different than a missing one.
2016-09-28coredumpctl: report corefile presence properlyZbigniew Jędrzejewski-Szmek
In 'list', show present also for coredumps stored in the journal. In 'status', replace "File" with "Storage" line that is always present. Possible values: Storage: none Storage: journal Storage: /path/to/file (inacessible) Storage: /path/to/file Previously the File field be only present if the file was accessible, so users had to manually extract the file name precisely in the cases where it was needed, i.e. when coredumpctl couldn't access the file. It's much more friendly to always show something. This output is designed for human consumption, so it's better to be a bit verbose. The call to sd_j_set_data_threshold is moved, so that status is always printed with the default of 64k, list uses 4k, and coredump retrieval is done with the limit unset. This should make checking for the presence of the COREDUMP field not too costly.
2016-09-28coredumpctl: report user unit properlyZbigniew Jędrzejewski-Szmek
2016-09-28coredumpctl: fix spurious "more than one entry matches" warningZbigniew Jędrzejewski-Szmek
sd_journal_previous() returns 0 if it didn't do any move, so the warning was stupidly always printed.
2016-09-28coredumpctl: fix handling of files written to fdZbigniew Jędrzejewski-Szmek
Added in 9fe13294a9 (by me :[```), and later obfuscated in d0c8806d4ab, if an uncompressed external file or an internally stored coredump was supposed to be written to a file descriptor, nothing would be written.
2016-09-28coredump: remove Storage=both optionZbigniew Jędrzejewski-Szmek
Back when external storage was initially added in 34c10968cb, this mode of storage was added. This could have made some sense back when XZ compression was used, and an uncompressed core on disk could be used as short-lived cache file which does require costly decompression. But now fast LZ4 compression is used (by default) both internally and externally, so we have duplicated storage, using the same compression and same default maximum core size in both cases, but with different expiration lifetimes. Even the uncompressed-external, compressed-internal mode is not very useful: for small files, decompression with LZ4 is fast enough not to matter, and for large files, decompression is still relatively fast, but the disk-usage penalty is very big. An additional problem with the two modes of storage is that it complicates the code and makes it much harder to return a useful error message to the user if we cannot find the core file, since if we cannot find the file we have to check the internal storage first. This patch drops "both" storage mode. Effectively this means that if somebody configured coredump this way, they will get a warning about an unsupported value for Storage, and the default of "external" will be used. I'm pretty sure that this mode is very rarely used anyway.
2016-09-28man: remove duplicate "the" for systemctl --plain (#4230)Alfie John
2016-09-28journal: add stdout_stream_scan() comment (#4102)Vito Caputo
When s->length is zero this function doesn't do anything, note that in a comment.
2016-09-28Merge pull request #4185 from endocode/djalal-sandbox-first-protection-v1Evgeny Vereshchagin
core:sandbox: Add new ProtectKernelTunables=, ProtectControlGroups=, ProtectSystem=strict and fixes
2016-09-27Merge pull request #4220 from keszybz/show-and-formatting-fixesMartin Pitt
Show and formatting fixes
2016-09-27basic: fix for IPv6 status (#4224)Susant Sahani
Even if ``` cat /proc/sys/net/ipv6/conf/all/disable_ipv6 1 ``` is disabled cat /proc/net/sockstat6 ``` TCP6: inuse 2 UDP6: inuse 1 UDPLITE6: inuse 0 RAW6: inuse 0 FRAG6: inuse 0 memory 0 ``` Looking for /proc/net/if_inet6 is the right choice.
2016-09-27test: make sure that {readonly|inaccessible|readwrite}paths disconnect mount ↵Djalal Harouni
propagation Better safe.
2016-09-27test: add tests for simple ReadOnlyPaths= caseDjalal Harouni
2016-09-26test-bus-creds: are more debugging infoZbigniew Jędrzejewski-Szmek
This test sometimes fails in semaphore, but not when run interactively, so it's hard to debug.
2016-09-26udev/path_id: introduce support for NVMe devices (#4169)Keith Busch
This appends the nvme name and namespace identifier attribute the the PCI path for by-path links. Symlinks like the following are now present: lrwxrwxrwx. 1 root root 13 Sep 16 12:12 pci-0000:01:00.0-nvme-1 -> ../../nvme0n1 lrwxrwxrwx. 1 root root 15 Sep 16 12:12 pci-0000:01:00.0-nvme-1-part1 -> ../../nvme0n1p1 Cc: Michal Sekletar <sekletar.m@gmail.com> Signed-off-by: Keith Busch <keith.busch@intel.com>
2016-09-26core: Fix USB functionfs activation and clarify its documentation (#4188)Paweł Szewczyk
There was no certainty about how the path in service file should look like for usb functionfs activation. Because of this it was treated differently in different places, which made this feature unusable. This patch fixes the path to be the *mount directory* of functionfs, not ep0 file path and clarifies in the documentation that ListenUSBFunction should be the location of functionfs mount point, not ep0 file itself.
2016-09-26machinectl: prefer user@ to --uid=user for shell (#4006)Zbigniew Jędrzejewski-Szmek
It seems to me that the explicit positional argument should have higher priority than "an option".
2016-09-26journald,ratelimit: fix wrong calculation of burst_modulate() (#4218)HATAYAMA Daisuke
This patch fixes wrong calculation of burst_modulate(), which now calculates the values smaller than really expected ones if available disk space is strictly more than 1MB. In particular, if available disk space is strictly more than 1MB and strictly less than 16MB, the resulted value becomes smaller than its original one. >>> (math.log2(1*1024**2)-16) / 4 1.0 >>> (math.log2(16*1024**2)-16) / 4 2.0 >>> (math.log2(256*1024**2)-16) / 4 3.0 → This matches the comment in the function.
2016-09-26coredump: initialize coredump_size in submit_coredump() (#4219)Matej Habrnal
If ulimit is smaller than page_size(), function save_external_coredump() returns -EBADSLT and this causes skipping whole core dumping part in submit_coredump(). Initializing coredump_size to UINT64_MAX prevents evaluating a condition with uninitialized varialbe which leads to calling allocate_journal_field() with coredump_fd = -1 which causes aborting. Signed-off-by: Matej Habrnal <mhabrnal@redhat.com>
2016-09-26treewide: fix typos (#4217)Torstein Husebø
2016-09-25test: add CAP_MKNOD tests for PrivateDevices=Djalal Harouni
2016-09-25core: Use @raw-io syscall group to filter I/O syscalls when PrivateDevices= ↵Djalal Harouni
is set Instead of having a local syscall list, use the @raw-io group which contains the same set of syscalls to filter.
2016-09-25core:namespace: simplify ProtectHome= implementationDjalal Harouni
As with previous patch simplify ProtectHome and don't care about duplicates, they will be sorted by most restrictive mode and cleaned.
2016-09-25core: simplify ProtectSystem= implementationDjalal Harouni
ProtectSystem= with all its different modes and other options like PrivateDevices= + ProtectKernelTunables= + ProtectHome= are orthogonal, however currently it's a bit hard to parse that from the implementation view. Simplify it by giving each mode its own table with all paths and references to other Protect options. With this change some entries are duplicated, but we do not care since duplicate mounts are first sorted by the most restrictive mode then cleaned.
2016-09-25core:sandbox: add more /proc/* entries to ProtectKernelTunables=Djalal Harouni
Make ALSA entries, latency interface, mtrr, apm/acpi, suspend interface, filesystems configuration and IRQ tuning readonly. Most of these interfaces now days should be in /sys but they are still available through /proc, so just protect them. This patch does not touch /proc/net/...
2016-09-25doc: explicitly document that /dev/mem and /dev/port are blocked by ↵Djalal Harouni
PrivateDevices=true
2016-09-25doc: documentation fixes for ReadWritePaths= and ProtectKernelTunables=Djalal Harouni
Documentation fixes for ReadWritePaths= and ProtectKernelTunables= as reported by Evgeny Vereshchagin.
2016-09-25core:namespace: simplify mount calculationDjalal Harouni
Move out mount calculation on its own function. Actually the logic is smart enough to later drop nop and duplicates mounts, this change improves code readability. --- src/core/namespace.c | 47 ++++++++++++++++++++++++++++++++++++----------- 1 file changed, 36 insertions(+), 11 deletions(-)
2016-09-25core:namespace: put paths protected by ProtectKernelTunables= inDjalal Harouni
Instead of having all these paths everywhere, put the ones that are protected by ProtectKernelTunables= into their own table. This way it is easy to add paths and track which ones are protected.
2016-09-25core:namespace: minor improvements to append_mounts()Djalal Harouni
2016-09-25execute: move SMACK setup code into its own functionLennart Poettering
While we are at it, move PAM code #ifdeffery into setup_pam() to simplify the main execution logic a bit.
2016-09-25namespace: drop all mounts outside of the new root directoryLennart Poettering
There's no point in mounting these, if they are outside of the root directory we'll move to.
2016-09-25main: minor simplificationLennart Poettering
2016-09-25Update TODOLennart Poettering
2016-09-25execute: filter low-level I/O syscalls if PrivateDevices= is setLennart Poettering
If device access is restricted via PrivateDevices=, let's also block the various low-level I/O syscalls at the same time, so that we know that the minimal set of devices in our virtualized /dev are really everything the unit can access.
2016-09-25NEWS: update news about systemd-udevd.serviceLennart Poettering
2016-09-25units: further lock down our long-running servicesLennart Poettering
Let's make this an excercise in dogfooding: let's turn on more security features for all our long-running services. Specifically: - Turn on RestrictRealtime=yes for all of them - Turn on ProtectKernelTunables=yes and ProtectControlGroups=yes for most of them - Turn on RestrictAddressFamilies= for all of them, but different sets of address families for each Also, always order settings in the unit files, that the various sandboxing features are close together. Add a couple of missing, older settings for a numbre of unit files. Note that this change turns off AF_INET/AF_INET6 from udevd, thus effectively turning of networking from udev rule commands. Since this might break stuff (that is already broken I'd argue) this is documented in NEWS.
2016-09-25units: permit importd to mount stuffLennart Poettering
Fixes #3996
2016-09-25man: shorten the exit status table a bitLennart Poettering
Let's merge a couple of columns, to make the table a bit shorter. This effectively just drops whitespace, not contents, but makes the currently humungous table much much more compact.
2016-09-25man: the exit code/signal is stored in $EXIT_CODE, not $EXIT_STATUSLennart Poettering
2016-09-25man: rework documentation for ReadOnlyPaths= and related settingsLennart Poettering
This reworks the documentation for ReadOnlyPaths=, ReadWritePaths=, InaccessiblePaths=. It no longer claims that we'd follow symlinks relative to the host file system. (Which wasn't true actually, as we didn't follow symlinks at all in the most recent releases, and we know do follow them, but relative to RootDirectory=). This also replaces all references to the fact that all fs namespacing options can be undone with enough privileges and disable propagation by a single one in the documentation of ReadOnlyPaths= and friends, and then directs the read to this in all other places. Moreover a hint is added to the documentation of SystemCallFilter=, suggesting usage of ~@mount in case any of the fs namespacing related options are used.
2016-09-25man: in user-facing documentaiton don't reference C function namesLennart Poettering
Let's drop the reference to the cap_from_name() function in the documentation for the capabilities setting, as it is hardly helpful. Our readers are not necessarily C hackers knowing the semantics of cap_from_name(). Moreover, the strings we accept are just the plain capability names as listed in capabilities(7) hence there's really no point in confusing the user with anything else.
2016-09-25namespace: don't make the root directory of a namespace a mount if it ↵Lennart Poettering
already is one Let's not stack mounts needlessly.
2016-09-25namespace: chase symlinks for mounts to set up in userspaceLennart Poettering
This adds logic to chase symlinks for all mount points that shall be created in a namespace environment in userspace, instead of leaving this to the kernel. This has the advantage that we can correctly handle absolute symlinks that shall be taken relative to a specific root directory. Moreover, we can properly handle mounts created on symlinked files or directories as we can merge their mounts as necessary. (This also drops the "done" flag in the namespace logic, which was never actually working, but was supposed to permit a partial rollback of the namespace logic, which however is only mildly useful as it wasn't clear in which case it would or would not be able to roll back.) Fixes: #3867
2016-09-25namespace: invoke unshare() only after checking all parametersLennart Poettering
Let's create the new namespace only after we validated and processed all parameters, right before we start with actually mounting things. This way, the window where we can roll back is larger (not that it matters IRL...)
2016-09-25execute: drop group priviliges only after setting up namespaceLennart Poettering
If PrivateDevices=yes is set, the namespace code creates device nodes in /dev that should be owned by the host's root, hence let's make sure we set up the namespace before dropping group privileges.
2016-09-25nspawn: let's mount /proc/sysrq-trigger read-only by defaultLennart Poettering
LXC does this, and we should probably too. Better safe than sorry.
2016-09-25core: imply ProtectHome=read-only and ProtectSystem=strict if DynamicUser=1Lennart Poettering
Let's make sure that services that use DynamicUser=1 cannot leave files in the file system should the system accidentally have a world-writable directory somewhere. This effectively ensures that directories need to be whitelisted rather than blacklisted for access when DynamicUser=1 is set.
2016-09-25core: introduce ProtectSystem=strictLennart Poettering
Let's tighten our sandbox a bit more: with this change ProtectSystem= gains a new setting "strict". If set, the entire directory tree of the system is mounted read-only, but the API file systems /proc, /dev, /sys are excluded (they may be managed with PrivateDevices= and ProtectKernelTunables=). Also, /home and /root are excluded as those are left for ProtectHome= to manage. In this mode, all "real" file systems (i.e. non-API file systems) are mounted read-only, and specific directories may only be excluded via ReadWriteDirectories=, thus implementing an effective whitelist instead of blacklist of writable directories. While we are at, also add /efi to the list of paths always affected by ProtectSystem=. This is a follow-up for b52a109ad38cd37b660ccd5394ff5c171a5e5355 which added /efi as alternative for /boot. Our namespacing logic should respect that too.