summaryrefslogtreecommitdiff
path: root/src/basic
AgeCommit message (Collapse)Author
2016-10-07core: add "invocation ID" concept to service managerLennart Poettering
This adds a new invocation ID concept to the service manager. The invocation ID identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is generated each time a unit moves from and inactive to an activating or active state. The primary usecase for this concept is to connect the runtime data PID 1 maintains about a service with the offline data the journal stores about it. Previously we'd use the unit name plus start/stop times, which however is highly racy since the journal will generally process log data after the service already ended. The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel, except that it applies to an individual unit instead of the whole system. The invocation ID is passed to the activated processes as environment variable. It is additionally stored as extended attribute on the cgroup of the unit. The latter is used by journald to automatically retrieve it for each log logged message and attach it to the log entry. The environment variable is very easily accessible, even for unprivileged services. OTOH the extended attribute is only accessible to privileged processes (this is because cgroupfs only supports the "trusted." xattr namespace, not "user."). The environment variable may be altered by services, the extended attribute may not be, hence is the better choice for the journal. Note that reading the invocation ID off the extended attribute from journald is racy, similar to the way reading the unit name for a logging process is. This patch adds APIs to read the invocation ID to sd-id128: sd_id128_get_invocation() may be used in a similar fashion to sd_id128_get_boot(). PID1's own logging is updated to always include the invocation ID when it logs information about a unit. A new bus call GetUnitByInvocationID() is added that allows retrieving a bus path to a unit by its invocation ID. The bus path is built using the invocation ID, thus providing a path for referring to a unit that is valid only for the current runtime cycleof it. Outlook for the future: should the kernel eventually allow passing of cgroup information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we can alter the invocation ID to be generated as hash from that rather than entirely randomly. This way we can derive the invocation race-freely from the messages.
2016-10-07util: use SPECIAL_ROOT_SLICE macro where appropriateLennart Poettering
2016-10-07log: minor fixesLennart Poettering
Most important is a fix to negate the error number if necessary, before we first access it.
2016-10-07strv: fix STRV_FOREACH_BACKWARDS() to be a single statement onlyLennart Poettering
Let's make sure people invoking STRV_FOREACH_BACKWARDS() as a single statement of an if statement don't fall into a trap, and find the tail for the list via strv_length().
2016-10-07architecture: Add support for the RISC-V architecture. (#4305)rwmjones
RISC-V is an open source ISA in development since 2010 at UCB. For more information, see https://riscv.org/ I am adding RISC-V support to Fedora: https://fedoraproject.org/wiki/Architectures/RISC-V There are three major variants of the architecture (32-, 64- and 128-bit). The 128-bit variant is a paper exercise, but the other two really exist in silicon. RISC-V is always little endian. On Linux, the default kernel uname(2) can return "riscv" for all variants. However a patch was added recently which makes the kernel return one of "riscv32" or "riscv64" (or in future "riscv128"). So systemd should be prepared to handle any of "riscv", "riscv32" or "riscv64" (in future, "riscv128" but that is not included in the current patch). If the kernel returns "riscv" then you need to use the pointer size in order to know the real variant. The Fedora/RISC-V kernel only ever returns "riscv64" since we're only doing Fedora for 64 bit at the moment, and we've patched the kernel so it doesn't return "riscv". As well as the major bitsize variants, there are also architecture extensions. However I'm trying to ensure that uname(2) does *not* return any other information about those in utsname.machine, so that we don't end up with "riscv64abcde" nonsense. Instead those extensions will be exposed in /proc/cpuinfo similar to how flags work in x86.
2016-10-06user-util: rework maybe_setgroups() a bitLennart Poettering
Let's drop the caching of the setgroups /proc field for now. While there's a strict regime in place when it changes states, let's better not cache it since we cannot really be sure we follow that regime correctly. More importantly however, this is not in performance sensitive code, and there's no indication the cache is really beneficial, hence let's drop the caching and make things a bit simpler. Also, while we are at it, rework the error handling a bit, and always return negative errno-style error codes, following our usual coding style. This has the benefit that we can sensible hanld read_one_line_file() errors, without having to updat errno explicitly.
2016-10-06sd-device/networkd: unify code to get a socket for issuing netdev ioctls onLennart Poettering
As suggested here: https://github.com/systemd/systemd/pull/4296#issuecomment-251911349 Let's try AF_INET first as socket, but let's fall back to AF_NETLINK, so that we can use a protocol-independent socket here if possible. This has the benefit that our code will still work even if AF_INET/AF_INET6 is made unavailable (for exmple via seccomp), at least on current kernels.
2016-10-06core: do not fail in a container if we can't use setgroupsGiuseppe Scrivano
It might be blocked through /proc/PID/setgroups
2016-10-06audit: disable if cannot create NETLINK_AUDIT socketGiuseppe Scrivano
2016-10-04tree-wide: remove consecutive duplicate words in commentsStefan Schweter
2016-10-04list: LIST_INSERT_BEFORE: update head if necessary (#4261)Michael Olbrich
If the new item is inserted before the first item in the list, then the head must be updated as well. Add a test to the list unit test to check for this.
2016-09-28Merge pull request #4185 from endocode/djalal-sandbox-first-protection-v1Evgeny Vereshchagin
core:sandbox: Add new ProtectKernelTunables=, ProtectControlGroups=, ProtectSystem=strict and fixes
2016-09-27Merge pull request #4220 from keszybz/show-and-formatting-fixesMartin Pitt
Show and formatting fixes
2016-09-27basic: fix for IPv6 status (#4224)Susant Sahani
Even if ``` cat /proc/sys/net/ipv6/conf/all/disable_ipv6 1 ``` is disabled cat /proc/net/sockstat6 ``` TCP6: inuse 2 UDP6: inuse 1 UDPLITE6: inuse 0 RAW6: inuse 0 FRAG6: inuse 0 memory 0 ``` Looking for /proc/net/if_inet6 is the right choice.
2016-09-25namespace: chase symlinks for mounts to set up in userspaceLennart Poettering
This adds logic to chase symlinks for all mount points that shall be created in a namespace environment in userspace, instead of leaving this to the kernel. This has the advantage that we can correctly handle absolute symlinks that shall be taken relative to a specific root directory. Moreover, we can properly handle mounts created on symlinked files or directories as we can merge their mounts as necessary. (This also drops the "done" flag in the namespace logic, which was never actually working, but was supposed to permit a partial rollback of the namespace logic, which however is only mildly useful as it wasn't clear in which case it would or would not be able to roll back.) Fixes: #3867
2016-09-25namespace: rework how ReadWritePaths= is appliedLennart Poettering
Previously, if ReadWritePaths= was nested inside a ReadOnlyPaths= specification, then we'd first recursively apply the ReadOnlyPaths= paths, and make everything below read-only, only in order to then flip the read-only bit again for the subdirs listed in ReadWritePaths= below it. This is not only ugly (as for the dirs in question we first turn on the RO bit, only to turn it off again immediately after), but also problematic in containers, where a container manager might have marked a set of dirs read-only and this code will undo this is ReadWritePaths= is set for any. With this patch behaviour in this regard is altered: ReadOnlyPaths= will not be applied to the children listed in ReadWritePaths= in the first place, so that we do not need to turn off the RO bit for those after all. This means that ReadWritePaths=/ReadOnlyPaths= may only be used to turn on the RO bit, but never to turn it off again. Or to say this differently: if some dirs are marked read-only via some external tool, then ReadWritePaths= will not undo it. This is not only the safer option, but also more in-line with what the man page currently claims: "Entries (files or directories) listed in ReadWritePaths= are accessible from within the namespace with the same access rights as from outside." To implement this change bind_remount_recursive() gained a new "blacklist" string list parameter, which when passed may contain subdirs that shall be excluded from the read-only mounting. A number of functions are updated to add more debug logging to make this more digestable.
2016-09-25execute: move suppression of HOME=/ and SHELL=/bin/nologin into user-util.cLennart Poettering
This adds a new call get_user_creds_clean(), which is just like get_user_creds() but returns NULL in the home/shell parameters if they contain no useful information. This code previously lived in execute.c, but by generalizing this we can reuse it in run.c.
2016-09-24basic/strv: add STRPTR_IN_SETZbigniew Jędrzejewski-Szmek
Also some trivial tests for STR_IN_SET and STRPTR_IN_SET.
2016-09-17Merge pull request #4123 from keszybz/network-file-dropinsMartin Pitt
Network file dropins
2016-09-16tree-wide: rename config_parse_many to …_nulstrZbigniew Jędrzejewski-Szmek
In preparation for adding a version which takes a strv.
2016-09-15Merge pull request #4131 from intelfx/update-done-timestamps-precisionZbigniew Jędrzejewski-Szmek
condition: ignore nanoseconds in timestamps for ConditionNeedsUpdate= Fixes #4130.
2016-09-15time-util: export timespec_load_nsec()Ivan Shapovalov
2016-09-14Merge pull request #4133 from keszybz/strerror-removalMartin Pitt
Strerror removal and other janitorial cleanups
2016-09-13Always use unicode ellipsis when ellipsizingZbigniew Jędrzejewski-Szmek
We were already unconditionally using the unicode character when the input string was not pure ASCII, leading to different behaviour in depending on the input string. systemd[1]: Starting printit.service. python3[19962]: foooooooooooooooooooooooooooooooooooo…oooo python3[19964]: fooąęoooooooooooooooooooooooooooooooo…oooo python3[19966]: fooąęoooooooooooooooooooooooooooooooo…ąęąę python3[19968]: fooąęoooooooooooooooooąęąęąęąęąęąęąęą…ąęąę systemd[1]: Started printit.service.
2016-09-13fileio: simplify mkostemp_safe() (#4090)Topi Miettinen
According to its manual page, flags given to mkostemp(3) shouldn't include O_RDWR, O_CREAT or O_EXCL flags as these are always included. Beyond those, the only flag that all callers (except a few tests where it probably doesn't matter) use is O_CLOEXEC, so set that unconditionally.
2016-08-31machinectl: split OS field in two; print ip addresses (#4058)Seraphime Kirkovski
This splits the OS field in two : one for the distribution name and one for the the version id. Dashes are written for missing fields. This also prints ip addresses of known machines. The `--max-addresses` option specifies how much ip addresses we want to see. The default is 1. When more than one address is written for a machine, a `,` follows it. If there are more ips than `--max-addresses`, `...` follows the last address.
2016-08-29basic/fileio: we always have O_TMPFILE nowYann E. MORIN
fileio makes use of O_TMPFILE when it is available. We now always have O_TMPFILE, defined in missing.h if missing from the toolchain headers. Have fileio include missing.h and drop the guards around the use of O_TMPFILE.
2016-08-29missing.h: add missing definitions for __O_TMPFILEYann E. MORIN
Currently, a missing __O_TMPFILE was only defined for i386 and x86_64, leaving any other architectures with an "old" toolchain fail miserably at build time: src/import/export-raw.c: In function 'reflink_snapshot': src/import/export-raw.c:271:26: error: 'O_TMPFILE' undeclared (first use in this function) new_fd = open(d, O_TMPFILE|O_CLOEXEC|O_NOCTTY|O_RDWR, 0600); ^ __O_TMPFILE (and O_TMPFILE) are available since glibc 2.19. However, a lot of existing toolchains are still using glibc-2.18, and some even before that, and it is not really possible to update those toolchains. Instead of defining it only for i386 and x86_64, define __O_TMPFILE with the specific values for those archs where it is different from the generic value. Use the values as found in the Linux kernel (v4.8-rc3, current as of time of commit). --- Note: tested on ARM (build+run), with glibc-2.18 and linux headers 3.12. Untested on other archs, though (I have no board to test this). Changes v1 -> v2: - add a comment specifying some are hexa, others are octal.
2016-08-19Merge pull request #3965 from htejun/systemd-controller-on-unifiedZbigniew Jędrzejewski-Szmek
2016-08-19terminal-util: remove unnecessary check of result of isatty() (#4000)0xAX
After the call of the isatty() we check its result twice in the open_terminal(). There are no sense to check result of isatty() that it is less than zero and return -errno, because as described in documentation: isatty() returns 1 if fd is an open file descriptor referring to a terminal; otherwise 0 is returned, and errno is set to indicate the error. So it can't be less than zero.
2016-08-19Merge pull request #3909 from poettering/mount-toolEvgeny Vereshchagin
add a new tool for creating transient mount and automount units
2016-08-19Merge pull request #3987 from keszybz/console-color-setupLennart Poettering
Rework console color setup
2016-08-19terminal-util: use getenv_bool for $SYSTEMD_COLORSZbigniew Jędrzejewski-Szmek
This changes the semantics a bit: before, SYSTEMD_COLORS= would be treated as "yes", same as SYSTEMD_COLORS=xxx and SYSTEMD_COLORS=1, and only SYSTEMD_COLORS=0 would be treated as "no". Now, only valid booleans are treated as "yes". This actually matches how $SYSTEMD_COLORS was announced in NEWS.
2016-08-19systemd: ignore lack of tty when checking whether colors should be enabledZbigniew Jędrzejewski-Szmek
When started by the kernel, we are connected to the console, and we'll set TERM properly to some value in fixup_environment(). We'll then enable or disable colors based on the value of $SYSTEMD_COLORS and $TERM. When reexecuting, TERM should be already set, so we can use this value. Effectively, behaviour is the same as before affd7ed1a was reverted, but instead of reopening the console before configuring color output, we just ignore what stdout is connected to and decide based on the variables only.
2016-08-19Merge pull request #3988 from keszybz/journald-dynamic-usersLennart Poettering
Journald dynamic users
2016-08-18journald: do not create split journals for dynamic usersZbigniew Jędrzejewski-Szmek
Dynamic users should be treated like system users, and their logs should end up in the main system journal.
2016-08-18logind: update empty and "infinity" handling for [User]TasksMax (#3835)Tejun Heo
The parsing functions for [User]TasksMax were inconsistent. Empty string and "infinity" were interpreted as no limit for TasksMax but not accepted for UserTasksMax. Update them so that they're consistent with other knobs. * Empty string indicates the default value. * "infinity" indicates no limit. While at it, replace opencoded (uint64_t) -1 with CGROUP_LIMIT_MAX in TasksMax handling. v2: Update empty string to indicate the default value as suggested by Zbigniew Jędrzejewski-Szmek. v3: Fixed empty UserTasksMax handling.
2016-08-18hostnamectl: rework pretty hostname validation (#3985)Lennart Poettering
Rework 17eb9a9ddba3f03fcba33445c1c1eedeb948da04 a bit. Let's make sure we don't clobber the input parameter args[1], following our coding style to not clobber parameters unless explicitly indicated. (in particular, as we don't want to have our changes appear in the command line shown in "ps"...) No functional change.
2016-08-18add a new tool for creating transient mount and automount unitsLennart Poettering
This adds "systemd-mount" which is for transient mount and automount units what "systemd-run" is for transient service, scope and timer units. The tool allows establishing mounts and automounts during runtime. It is very similar to the usual /bin/mount commands, but can pull in additional dependenices on access (for example, it pulls in fsck automatically), an take benefit of the automount logic. This tool is particularly useful for mount removable file systems (such as USB sticks), as the automount logic (together with automatic unmount-on-idle), as well as automatic fsck on first access ensure that the removable file system has a high chance to remain in a fully clean state even when it is unplugged abruptly, and returns to a clean state on the next re-plug. This is a follow-up for #2471, as it adds a simple client-side for the transient automount logic added in that PR. In later work it might make sense to invoke this tool automatically from udev rules in order to implement a simpler and safer version of removable media management á la udisks.
2016-08-17core: use the unified hierarchy for the systemd cgroup controller hierarchyTejun Heo
Currently, systemd uses either the legacy hierarchies or the unified hierarchy. When the legacy hierarchies are used, systemd uses a named legacy hierarchy mounted on /sys/fs/cgroup/systemd without any kernel controllers for process management. Due to the shortcomings in the legacy hierarchy, this involves a lot of workarounds and complexities. Because the unified hierarchy can be mounted and used in parallel to legacy hierarchies, there's no reason for systemd to use a legacy hierarchy for management even if the kernel resource controllers need to be mounted on legacy hierarchies. It can simply mount the unified hierarchy under /sys/fs/cgroup/systemd and use it without affecting other legacy hierarchies. This disables a significant amount of fragile workaround logics and would allow using features which depend on the unified hierarchy membership such bpf cgroup v2 membership test. In time, this would also allow deleting the said complexities. This patch updates systemd so that it prefers the unified hierarchy for the systemd cgroup controller hierarchy when legacy hierarchies are used for kernel resource controllers. * cg_unified(@controller) is introduced which tests whether the specific controller in on unified hierarchy and used to choose the unified hierarchy code path for process and service management when available. Kernel controller specific operations remain gated by cg_all_unified(). * "systemd.legacy_systemd_cgroup_controller" kernel argument can be used to force the use of legacy hierarchy for systemd cgroup controller. * nspawn: By default nspawn uses the same hierarchies as the host. If UNIFIED_CGROUP_HIERARCHY is set to 1, unified hierarchy is used for all. If 0, legacy for all. * nspawn: arg_unified_cgroup_hierarchy is made an enum and now encodes one of three options - legacy, only systemd controller on unified, and unified. The value is passed into mount setup functions and controls cgroup configuration. * nspawn: Interpretation of SYSTEMD_CGROUP_CONTROLLER to the actual mount option is moved to mount_legacy_cgroup_hierarchy() so that it can take an appropriate action depending on the configuration of the host. v2: - CGroupUnified enum replaces open coded integer values to indicate the cgroup operation mode. - Various style updates. v3: Fixed a bug in detect_unified_cgroup_hierarchy() introduced during v2. v4: Restored legacy container on unified host support and fixed another bug in detect_unified_cgroup_hierarchy().
2016-08-15core: rename cg_unified() to cg_all_unified()Tejun Heo
A following patch will update cgroup handling so that the systemd controller (/sys/fs/cgroup/systemd) can use the unified hierarchy even if the kernel resource controllers are on the legacy hierarchies. This would require distinguishing whether all controllers are on cgroup v2 or only the systemd controller is. In preparation, this patch renames cg_unified() to cg_all_unified(). This patch doesn't cause any functional changes.
2016-08-14Merge pull request #3905 from htejun/cgroup-v2-cpuZbigniew Jędrzejewski-Szmek
core: add cgroup CPU controller support on the unified hierarchy (zj: merging not squashing to make it clear against which upstream this patch was developed.)
2016-08-07core: add cgroup CPU controller support on the unified hierarchyTejun Heo
Unfortunately, due to the disagreements in the kernel development community, CPU controller cgroup v2 support has not been merged and enabling it requires applying two small out-of-tree kernel patches. The situation is explained in the following documentation. https://git.kernel.org/cgit/linux/kernel/git/tj/cgroup.git/tree/Documentation/cgroup-v2-cpu.txt?h=cgroup-v2-cpu While it isn't clear what will happen with CPU controller cgroup v2 support, there are critical features which are possible only on cgroup v2 such as buffered write control making cgroup v2 essential for a lot of workloads. This commit implements systemd CPU controller support on the unified hierarchy so that users who choose to deploy CPU controller cgroup v2 support can easily take advantage of it. On the unified hierarchy, "cpu.weight" knob replaces "cpu.shares" and "cpu.max" replaces "cpu.cfs_period_us" and "cpu.cfs_quota_us". [Startup]CPUWeight config options are added with the usual compat translation. CPU quota settings remain unchanged and apply to both legacy and unified hierarchies. v2: - Error in man page corrected. - CPU config application in cgroup_context_apply() refactored. - CPU accounting now works on unified hierarchy.
2016-08-05Merge pull request #3818 from poettering/exit-status-envZbigniew Jędrzejewski-Szmek
beef up /var/tmp and /tmp handling; set $SERVICE_RESULT/$EXIT_CODE/$EXIT_STATUS on ExecStop= and make sure root/nobody are always resolvable
2016-08-05Merge pull request #3900 from keszybz/fix-3607Lennart Poettering
Fix 3607
2016-08-05util-lib: unify parsing of nice level valuesLennart Poettering
This adds parse_nice() that parses a nice level and ensures it is in the right range, via a new nice_is_valid() helper. It then ports over a number of users to this. No functional changes.
2016-08-05fileio: fix MIN/MAX mixup (#3896)Vito Caputo
The intention is to clamp the value to READ_FULL_BYTES_MAX, which would be the minimum of the two.
2016-08-04basic/set: remove some spurious spacesZbigniew Jędrzejewski-Szmek
2016-08-04fileio: fix read_full_stream() bugs (#3887)Vito Caputo
read_full_stream() _always_ allocated twice the memory needed, due to only breaking the realloc() && fread() loop when fread() returned 0, requiring another iteration and exponentially enlarged buffer just to discover the EOF condition. This also caused file sizes >2MiB && <= 4MiB to erroneously be treated as E2BIG, due to the inappropriately doubled buffer size exceeding 4*1024*1024. Also made the 4*1024*1024 magic number a READ_FULL_BYTES_MAX constant.
2016-08-04util-lib: rework /tmp and /var/tmp handling codeLennart Poettering
Beef up the existing var_tmp() call, rename it to var_tmp_dir() and add a matching tmp_dir() call (the former looks for the place for /var/tmp, the latter for /tmp). Both calls check $TMPDIR, $TEMP, $TMP, following the algorithm Python3 uses. All dirs are validated before use. secure_getenv() is used in order to limite exposure in suid binaries. This also ports a couple of users over to these new APIs. The var_tmp() return parameter is changed from an allocated buffer the caller will own to a const string either pointing into environ[], or into a static const buffer. Given that environ[] is mostly considered constant (and this is exposed in the very well-known getenv() call), this should be OK behaviour and allows us to avoid memory allocations in most cases. Note that $TMPDIR and friends override both /var/tmp and /tmp usage if set.