summaryrefslogtreecommitdiff
path: root/src/basic
AgeCommit message (Collapse)Author
2016-12-17basic/virt: fix userns check on CONFIG_USER_NS=n kernel (#4651)Zbigniew Jędrzejewski-Szmek
ENOENT should be treated as "false", but because of the broken errno check it was treated as an error. So ConditionVirtualization=user-namespaces probably returned the correct answer, but only by accident. Fixes #4608.
2016-12-17core: don't use the unified hierarchy for the systemd cgroup yet (#4628)Martin Pitt
Too many things don't get along with the unified hierarchy yet: * https://github.com/opencontainers/runc/issues/1175 * https://github.com/docker/docker/issues/28109 * https://github.com/lxc/lxc/issues/1280 So revert the default to the legacy hierarchy for now. Developers of the above software can opt into the unified hierarchy with "systemd.legacy_systemd_cgroup_controller=0".
2016-11-02Revert some uses of xsprintfZbigniew Jędrzejewski-Szmek
This reverts some changes introduced in d054f0a4d4. xsprintf should be used in cases where we calculated the right buffer size by hand (using DECIMAL_STRING_MAX and such), and never in cases where we are printing externally specified strings of arbitrary length. Fixes #4534.
2016-11-02core: make the root mount perpetual tooLennart Poettering
Now that have a proper concept of "perpetual" units, let's make the root mount one too, since it also cannot go away.
2016-11-01Recognise Lustre as a remote file system (#4530)Brian J. Murrell
Lustre is also a remote file system that wants the network to be up before it is mounted.
2016-10-30tests: clarify test_path_startswith return value (#4508)Zbigniew Jędrzejewski-Szmek
A pendant for #4481.
2016-10-27Merge pull request #4442 from keszybz/detect-virt-usernsEvgeny Vereshchagin
detect-virt: add --private-users switch to check if a userns is active; add Condition=private-users
2016-10-26detect-virt: add --private-users switch to check if a userns is activeZbigniew Jędrzejewski-Szmek
Various things don't work when we're running in a user namespace, but it's pretty hard to reliably detect if that is true. A function is added which looks at /proc/self/uid_map and returns false if the default "0 0 UINT32_MAX" is found, and true if it finds anything else. This misses the case where an 1:1 mapping with the full range was used, but I don't know how to distinguish this case. 'systemd-detect-virt --private-users' is very similar to 'systemd-detect-virt --chroot', but we check for a user namespace instead.
2016-10-24Merge pull request #4459 from keszybz/commandline-parsingLennart Poettering
Commandline parsing simplification and udev fix
2016-10-23basic: fallback to the fstat if we don't have access to the /proc/self/fdinfoEvgeny Vereshchagin
https://github.com/systemd/systemd/pull/4372#discussion_r83354107: I get `open("/proc/self/fdinfo/13", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)` 327 mkdir("/proc", 0755 <unfinished ...> 327 <... mkdir resumed> ) = -1 EEXIST (File exists) 327 stat("/proc", <unfinished ...> 327 <... stat resumed> {st_dev=makedev(8, 1), st_ino=28585, st_mode=S_IFDIR|0755, st_nlink=2, st_uid=0, st_gid=0, st_blksize=1024, st_blocks=4, st_size=1024, st_atime=2016/10/14-02:55:32, st_mtime=2016/ 327 mount("proc", "/proc", "proc", MS_NOSUID|MS_NODEV|MS_NOEXEC, NULL <unfinished ...> 327 <... mount resumed> ) = 0 327 lstat("/proc", <unfinished ...> 327 <... lstat resumed> {st_dev=makedev(0, 34), st_ino=1, st_mode=S_IFDIR|0555, st_nlink=75, st_uid=65534, st_gid=65534, st_blksize=1024, st_blocks=0, st_size=0, st_atime=2016/10/14-03:13:35.971031263, 327 lstat("/proc/sys", {st_dev=makedev(0, 34), st_ino=4026531855, st_mode=S_IFDIR|0555, st_nlink=1, st_uid=65534, st_gid=65534, st_blksize=1024, st_blocks=0, st_size=0, st_atime=2016/10/14-03:13:39.1630 327 openat(AT_FDCWD, "/proc", O_RDONLY|O_DIRECTORY|O_CLOEXEC|O_PATH) = 11</proc> 327 name_to_handle_at(11</proc>, "sys", {handle_bytes=128}, 0x7ffe3a238604, AT_SYMLINK_FOLLOW) = -1 EOPNOTSUPP (Operation not supported) 327 name_to_handle_at(11</proc>, "", {handle_bytes=128}, 0x7ffe3a238608, AT_EMPTY_PATH) = -1 EOPNOTSUPP (Operation not supported) 327 openat(11</proc>, "sys", O_RDONLY|O_CLOEXEC|O_PATH) = 13</proc/sys> 327 open("/proc/self/fdinfo/13", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied) 327 close(13</proc/sys> <unfinished ...> 327 <... close resumed> ) = 0 327 close(11</proc> <unfinished ...> 327 <... close resumed> ) = 0 -bash-4.3# ls -ld /proc/ dr-xr-xr-x 76 65534 65534 0 Oct 14 02:57 /proc/ -bash-4.3# ls -ld /proc/1 dr-xr-xr-x 9 root root 0 Oct 14 02:57 /proc/1 -bash-4.3# ls -ld /proc/1/fdinfo dr-x------ 2 65534 65534 0 Oct 14 03:00 /proc/1/fdinfo
2016-10-22tree-wide: make parse_proc_cmdline() strip "rd." prefix automaticallyZbigniew Jędrzejewski-Szmek
This stripping is contolled by a new boolean parameter. When the parameter is true, it means that the caller does not care about the distinction between initrd and real root, and wants to act on both rd-dot-prefixed and unprefixed parameters in the initramfs, and only on the unprefixed parameters in real root. If the parameter is false, behaviour is the same as before. Changes by caller: log.c (systemd.log_*): changed to accept rd-dot-prefix params pid1: no change, custom logic cryptsetup-generator: no change, still accepts rd-dot-prefix params debug-generator: no change, does not accept rd-dot-prefix params fsck: changed to accept rd-dot-prefix params fstab-generator: no change, custom logic gpt-auto-generator: no change, custom logic hibernate-resume-generator: no change, does not accept rd-dot-prefix params journald: changed to accept rd-dot-prefix params modules-load: no change, still accepts rd-dot-prefix params quote-check: no change, does not accept rd-dot-prefix params udevd: no change, still accepts rd-dot-prefix params I added support for "rd." params in the three cases where I think it's useful: logging, fsck options, journald forwarding options.
2016-10-22tree-wide: allow state to be passed through to parse_proc_cmdline_itemZbigniew Jędrzejewski-Szmek
No functional change.
2016-10-18Merge pull request #4382 from keszybz/unit-type-underlineLennart Poettering
systemctl: use underlines to seperate unit types in listing
2016-10-17terminal-util: helper macro for highlighting functionsZbigniew Jędrzejewski-Szmek
2016-10-17systemctl: use underlines to seperate unit types in listingZbigniew Jędrzejewski-Szmek
(printf("%.*s", -1, "…") is the same as not specifying the precision at all.) v2: also underline highlighted (failing) units Fixes #4137.
2016-10-16tree-wide: introduce free_and_replace helperZbigniew Jędrzejewski-Szmek
It's a common pattern, so add a helper for it. A macro is necessary because a function that takes a pointer to a pointer would be type specific, similarly to cleanup functions. Seems better to use a macro.
2016-10-16tree-wide: use mfree moreZbigniew Jędrzejewski-Szmek
2016-10-15virt: add possibility to skip the check for chroot (#4374)Lukáš Nykrýn
https://bugzilla.redhat.com/show_bug.cgi?id=1379852
2016-10-13Merge pull request #4363 from stefan-it/replace-while-loopsDaniel Mack
basic,coredump: use for loop instead of while
2016-10-13nspawn: cleanup and chown the synced cgroup hierarchy (#4223)Evgeny Vereshchagin
Fixes: #4181
2016-10-12basic: use for() loop instead of while()Stefan Schweter
2016-10-12Merge pull request #4351 from keszybz/nspawn-debuggingLennart Poettering
Enhance nspawn debug logs for mount/unmount operations
2016-10-12Allow block and char classes in DeviceAllow bus properties (#4353)Zbigniew Jędrzejewski-Szmek
Allowed paths are unified betwen the configuration file parses and the bus property checker. The biggest change is that the bus code now allows "block-" and "char-" classes. In addition, path_startswith("/dev") was used in the bus code, and startswith("/dev") was used in the config file code. It seems reasonable to use path_startswith() which allows a slightly broader class of strings. Fixes #3935.
2016-10-11missing: add a bunch of mount flagsZbigniew Jędrzejewski-Szmek
2016-10-11nspawn,mount-util: add [u]mount_verbose and use it in nspawnZbigniew Jędrzejewski-Szmek
This makes it easier to debug failed nspawn invocations: Mounting sysfs on /var/lib/machines/fedora-rawhide/sys (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV "")... Mounting tmpfs on /var/lib/machines/fedora-rawhide/dev (MS_NOSUID|MS_STRICTATIME "mode=755,uid=1450901504,gid=1450901504")... Mounting tmpfs on /var/lib/machines/fedora-rawhide/dev/shm (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=1777,uid=1450901504,gid=1450901504")... Mounting tmpfs on /var/lib/machines/fedora-rawhide/run (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=755,uid=1450901504,gid=1450901504")... Bind-mounting /sys/fs/selinux on /var/lib/machines/fedora-rawhide/sys/fs/selinux (MS_BIND "")... Remounting /var/lib/machines/fedora-rawhide/sys/fs/selinux (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_BIND|MS_REMOUNT "")... Mounting proc on /proc (MS_NOSUID|MS_NOEXEC|MS_NODEV "")... Bind-mounting /proc/sys on /proc/sys (MS_BIND "")... Remounting /proc/sys (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_BIND|MS_REMOUNT "")... Bind-mounting /proc/sysrq-trigger on /proc/sysrq-trigger (MS_BIND "")... Remounting /proc/sysrq-trigger (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_BIND|MS_REMOUNT "")... Mounting tmpfs on /tmp (MS_STRICTATIME "mode=1777,uid=0,gid=0")... Mounting tmpfs on /sys/fs/cgroup (MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_STRICTATIME "mode=755,uid=0,gid=0")... Mounting cgroup on /sys/fs/cgroup/systemd (MS_NOSUID|MS_NOEXEC|MS_NODEV "none,name=systemd,xattr")... Failed to mount cgroup on /sys/fs/cgroup/systemd (MS_NOSUID|MS_NOEXEC|MS_NODEV "none,name=systemd,xattr"): No such file or directory
2016-10-11Merge pull request #4067 from poettering/invocation-idZbigniew Jędrzejewski-Szmek
Add an "invocation ID" concept to the service manager
2016-10-10core: when determining whether a process exit status is clean, consider ↵Lennart Poettering
whether it is a command or a daemon SIGTERM should be considered a clean exit code for daemons (i.e. long-running processes, as a daemon without SIGTERM handler may be shut down without issues via SIGTERM still) while it should not be considered a clean exit code for commands (i.e. short-running processes). Let's add two different clean checking modes for this, and use the right one at the appropriate places. Fixes: #4275
2016-10-10exit-status: kill is_clean_exit_lsb(), move logic to sysv-generatorLennart Poettering
Let's get rid of is_clean_exit_lsb(), let's move the logic for the special handling of the two LSB exit codes into the sysv-generator by writing out appropriate SuccessExitStatus= lines if the LSB header exists. This is not only semantically more correct, bug also fixes a bug as the code in service.c that chose between is_clean_exit_lsb() and is_clean_exit() based this check on whether a native unit files was available for the unit. However, that check was bogus since a long time, since the SysV generator was introduced and native SysV script support was removed from PID 1, as in that case a unit file always existed.
2016-10-10exit-status: reorder the exit status switch tableLennart Poettering
Let's make sure it's in the same order as the actual enum defining the exit statuses.
2016-10-10exit-status: remove ExitStatus typedefLennart Poettering
Do not make up our own type for ExitStatus, but use the type used by POSIX for this, which is "int". In particular as we never used that type outside of the definition of exit_status_to_string() where we internally cast the paramter to (int) every single time we used it. Hence, let's simplify things, drop the type and use the kernel type directly.
2016-10-10Merge pull request #4310 from keszybz/nspawn-autodetectEvgeny Vereshchagin
Autodetect systemd version in containers started by systemd-nspawn
2016-10-08path-util: add a function to peek into a container and guess systemd versionZbigniew Jędrzejewski-Szmek
This is a bit crude and only works for new systemd versions which have libsystemd-shared.
2016-10-08networkd: address add support to configure flags (#4201)Susant Sahani
This patch enables to configure IFA_F_HOMEADDRESS IFA_F_NODAD IFA_F_MANAGETEMPADDR IFA_F_NOPREFIXROUTE IFA_F_MCAUTOJOIN
2016-10-07core: add "invocation ID" concept to service managerLennart Poettering
This adds a new invocation ID concept to the service manager. The invocation ID identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is generated each time a unit moves from and inactive to an activating or active state. The primary usecase for this concept is to connect the runtime data PID 1 maintains about a service with the offline data the journal stores about it. Previously we'd use the unit name plus start/stop times, which however is highly racy since the journal will generally process log data after the service already ended. The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel, except that it applies to an individual unit instead of the whole system. The invocation ID is passed to the activated processes as environment variable. It is additionally stored as extended attribute on the cgroup of the unit. The latter is used by journald to automatically retrieve it for each log logged message and attach it to the log entry. The environment variable is very easily accessible, even for unprivileged services. OTOH the extended attribute is only accessible to privileged processes (this is because cgroupfs only supports the "trusted." xattr namespace, not "user."). The environment variable may be altered by services, the extended attribute may not be, hence is the better choice for the journal. Note that reading the invocation ID off the extended attribute from journald is racy, similar to the way reading the unit name for a logging process is. This patch adds APIs to read the invocation ID to sd-id128: sd_id128_get_invocation() may be used in a similar fashion to sd_id128_get_boot(). PID1's own logging is updated to always include the invocation ID when it logs information about a unit. A new bus call GetUnitByInvocationID() is added that allows retrieving a bus path to a unit by its invocation ID. The bus path is built using the invocation ID, thus providing a path for referring to a unit that is valid only for the current runtime cycleof it. Outlook for the future: should the kernel eventually allow passing of cgroup information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we can alter the invocation ID to be generated as hash from that rather than entirely randomly. This way we can derive the invocation race-freely from the messages.
2016-10-07util: use SPECIAL_ROOT_SLICE macro where appropriateLennart Poettering
2016-10-07log: minor fixesLennart Poettering
Most important is a fix to negate the error number if necessary, before we first access it.
2016-10-07strv: fix STRV_FOREACH_BACKWARDS() to be a single statement onlyLennart Poettering
Let's make sure people invoking STRV_FOREACH_BACKWARDS() as a single statement of an if statement don't fall into a trap, and find the tail for the list via strv_length().
2016-10-07architecture: Add support for the RISC-V architecture. (#4305)rwmjones
RISC-V is an open source ISA in development since 2010 at UCB. For more information, see https://riscv.org/ I am adding RISC-V support to Fedora: https://fedoraproject.org/wiki/Architectures/RISC-V There are three major variants of the architecture (32-, 64- and 128-bit). The 128-bit variant is a paper exercise, but the other two really exist in silicon. RISC-V is always little endian. On Linux, the default kernel uname(2) can return "riscv" for all variants. However a patch was added recently which makes the kernel return one of "riscv32" or "riscv64" (or in future "riscv128"). So systemd should be prepared to handle any of "riscv", "riscv32" or "riscv64" (in future, "riscv128" but that is not included in the current patch). If the kernel returns "riscv" then you need to use the pointer size in order to know the real variant. The Fedora/RISC-V kernel only ever returns "riscv64" since we're only doing Fedora for 64 bit at the moment, and we've patched the kernel so it doesn't return "riscv". As well as the major bitsize variants, there are also architecture extensions. However I'm trying to ensure that uname(2) does *not* return any other information about those in utsname.machine, so that we don't end up with "riscv64abcde" nonsense. Instead those extensions will be exposed in /proc/cpuinfo similar to how flags work in x86.
2016-10-06user-util: rework maybe_setgroups() a bitLennart Poettering
Let's drop the caching of the setgroups /proc field for now. While there's a strict regime in place when it changes states, let's better not cache it since we cannot really be sure we follow that regime correctly. More importantly however, this is not in performance sensitive code, and there's no indication the cache is really beneficial, hence let's drop the caching and make things a bit simpler. Also, while we are at it, rework the error handling a bit, and always return negative errno-style error codes, following our usual coding style. This has the benefit that we can sensible hanld read_one_line_file() errors, without having to updat errno explicitly.
2016-10-06sd-device/networkd: unify code to get a socket for issuing netdev ioctls onLennart Poettering
As suggested here: https://github.com/systemd/systemd/pull/4296#issuecomment-251911349 Let's try AF_INET first as socket, but let's fall back to AF_NETLINK, so that we can use a protocol-independent socket here if possible. This has the benefit that our code will still work even if AF_INET/AF_INET6 is made unavailable (for exmple via seccomp), at least on current kernels.
2016-10-06core: do not fail in a container if we can't use setgroupsGiuseppe Scrivano
It might be blocked through /proc/PID/setgroups
2016-10-06audit: disable if cannot create NETLINK_AUDIT socketGiuseppe Scrivano
2016-10-04tree-wide: remove consecutive duplicate words in commentsStefan Schweter
2016-10-04list: LIST_INSERT_BEFORE: update head if necessary (#4261)Michael Olbrich
If the new item is inserted before the first item in the list, then the head must be updated as well. Add a test to the list unit test to check for this.
2016-09-28Merge pull request #4185 from endocode/djalal-sandbox-first-protection-v1Evgeny Vereshchagin
core:sandbox: Add new ProtectKernelTunables=, ProtectControlGroups=, ProtectSystem=strict and fixes
2016-09-27Merge pull request #4220 from keszybz/show-and-formatting-fixesMartin Pitt
Show and formatting fixes
2016-09-27basic: fix for IPv6 status (#4224)Susant Sahani
Even if ``` cat /proc/sys/net/ipv6/conf/all/disable_ipv6 1 ``` is disabled cat /proc/net/sockstat6 ``` TCP6: inuse 2 UDP6: inuse 1 UDPLITE6: inuse 0 RAW6: inuse 0 FRAG6: inuse 0 memory 0 ``` Looking for /proc/net/if_inet6 is the right choice.
2016-09-25namespace: chase symlinks for mounts to set up in userspaceLennart Poettering
This adds logic to chase symlinks for all mount points that shall be created in a namespace environment in userspace, instead of leaving this to the kernel. This has the advantage that we can correctly handle absolute symlinks that shall be taken relative to a specific root directory. Moreover, we can properly handle mounts created on symlinked files or directories as we can merge their mounts as necessary. (This also drops the "done" flag in the namespace logic, which was never actually working, but was supposed to permit a partial rollback of the namespace logic, which however is only mildly useful as it wasn't clear in which case it would or would not be able to roll back.) Fixes: #3867
2016-09-25namespace: rework how ReadWritePaths= is appliedLennart Poettering
Previously, if ReadWritePaths= was nested inside a ReadOnlyPaths= specification, then we'd first recursively apply the ReadOnlyPaths= paths, and make everything below read-only, only in order to then flip the read-only bit again for the subdirs listed in ReadWritePaths= below it. This is not only ugly (as for the dirs in question we first turn on the RO bit, only to turn it off again immediately after), but also problematic in containers, where a container manager might have marked a set of dirs read-only and this code will undo this is ReadWritePaths= is set for any. With this patch behaviour in this regard is altered: ReadOnlyPaths= will not be applied to the children listed in ReadWritePaths= in the first place, so that we do not need to turn off the RO bit for those after all. This means that ReadWritePaths=/ReadOnlyPaths= may only be used to turn on the RO bit, but never to turn it off again. Or to say this differently: if some dirs are marked read-only via some external tool, then ReadWritePaths= will not undo it. This is not only the safer option, but also more in-line with what the man page currently claims: "Entries (files or directories) listed in ReadWritePaths= are accessible from within the namespace with the same access rights as from outside." To implement this change bind_remount_recursive() gained a new "blacklist" string list parameter, which when passed may contain subdirs that shall be excluded from the read-only mounting. A number of functions are updated to add more debug logging to make this more digestable.
2016-09-25execute: move suppression of HOME=/ and SHELL=/bin/nologin into user-util.cLennart Poettering
This adds a new call get_user_creds_clean(), which is just like get_user_creds() but returns NULL in the home/shell parameters if they contain no useful information. This code previously lived in execute.c, but by generalizing this we can reuse it in run.c.