summaryrefslogtreecommitdiff
path: root/src/core
AgeCommit message (Collapse)Author
2016-10-24seccomp: add new seccomp_init_conservative() helperLennart Poettering
This adds a new seccomp_init_conservative() helper call that is mostly just a wrapper around seccomp_init(), but turns off NNP and adds in all secondary archs, for best compatibility with everything else. Pretty much all of our code used the very same constructs for these three steps, hence unifying this in one small function makes things a lot shorter. This also changes incorrect usage of the "scmp_filter_ctx" type at various places. libseccomp defines it as typedef to "void*", i.e. it is a pointer type (pretty poor choice already!) that casts implicitly to and from all other pointer types (even poorer choice: you defined a confusing type now, and don't even gain any bit of type safety through it...). A lot of the code assumed the type would refer to a structure, and hence aded additional "*" here and there. Remove that.
2016-10-24core: rework apply_protect_kernel_modules() to use ↵Lennart Poettering
seccomp_add_syscall_filter_set() Let's simplify this call, by making use of the new infrastructure. This is actually more in line with Djalal's original patch but instead of search the filter set in the array by its name we can now use the set index and jump directly to it.
2016-10-24core: rework syscall filter set handlingLennart Poettering
A variety of fixes: - rename the SystemCallFilterSet structure to SyscallFilterSet. So far the main instance of it (the syscall_filter_sets[] array) used to abbreviate "SystemCall" as "Syscall". Let's stick to one of the two syntaxes, and not mix and match too wildly. Let's pick the shorter name in this case, as it is sufficiently well established to not confuse hackers reading this. - Export explicit indexes into the syscall_filter_sets[] array via an enum. This way, code that wants to make use of a specific filter set, can index it directly via the enum, instead of having to search for it. This makes apply_private_devices() in particular a lot simpler. - Provide two new helper calls in seccomp-util.c: syscall_filter_set_find() to find a set by its name, seccomp_add_syscall_filter_set() to add a set to a seccomp object. - Update SystemCallFilter= parser to use extract_first_word(). Let's work on deprecating FOREACH_WORD_QUOTED(). - Simplify apply_private_devices() using this functionality
2016-10-24core: move misplaced comment to the right placeLennart Poettering
2016-10-24core: simplify skip_seccomp_unavailable() a bitLennart Poettering
Let's prefer early-exit over deep-indented if blocks. Not behavioural change.
2016-10-24Merge pull request #4459 from keszybz/commandline-parsingLennart Poettering
Commandline parsing simplification and udev fix
2016-10-24Merge pull request #4406 from jsynacek/jsynacek-is-enabledLennart Poettering
shared, systemctl: teach is-enabled to show install targets
2016-10-24core: do not assert when sysconf(_SC_NGROUPS_MAX) fails (#4466)Djalal Harouni
Remove the assert and check the return code of sysconf(_SC_NGROUPS_MAX). _SC_NGROUPS_MAX maps to NGROUPS_MAX which is defined in <limits.h> to 65536 these days. The value is a sysctl read-only /proc/sys/kernel/ngroups_max and the kernel assumes that it is always positive otherwise things may break. Follow this and support only positive values for all other case return either -errno or -EOPNOTSUPP. Now if there are systems that want to re-write NGROUPS_MAX then they should not pass SupplementaryGroups= in units even if it is empty, in this case nothing fails and we just ignore supplementary groups. However if SupplementaryGroups= is passed even if it is empty we have to assume that there will be groups manipulation from our side or the kernel and since the kernel always assumes that NGROUPS_MAX is positive, then follow that and support only positive values.
2016-10-24shared, systemctl: teach is-enabled to show installation targetsJan Synacek
It may be desired by users to know what targets a particular service is installed into. Improve user friendliness by teaching the is-enabled command to show such information when used with --full. This patch makes use of the newly added UnitFileFlags and adds UNIT_FILE_DRY_RUN flag into it. Since the API had already been modified, it's now easy to add the new dry-run feature for other commands as well. As a next step, --dry-run could be added to systemctl, which in turn might pave the way for a long requested dry-run feature when running systemctl start.
2016-10-24install: introduce UnitFileFlagsJan Synacek
Introduce a new enum to get rid of some boolean arguments of unit_file_* functions. It unifies the code, makes it a bit cleaner and extensible.
2016-10-23core: lets move the setup of working directory before group enforceDjalal Harouni
This is minor but lets try to split and move bit by bit cgroups and portable environment setup before applying the security context.
2016-10-23core: first lookup and cache creds then apply them after namespace setupDjalal Harouni
This fixes: https://github.com/systemd/systemd/issues/4357 Let's lookup and cache creds then apply them. We also switch from getgroups() to getgrouplist().
2016-10-22Merge pull request #4428 from lnykryn/ctrl_v2Zbigniew Jędrzejewski-Szmek
rename failure-action to emergency-action and use it for ctrl+alt+del burst
2016-10-22tree-wide: make parse_proc_cmdline() strip "rd." prefix automaticallyZbigniew Jędrzejewski-Szmek
This stripping is contolled by a new boolean parameter. When the parameter is true, it means that the caller does not care about the distinction between initrd and real root, and wants to act on both rd-dot-prefixed and unprefixed parameters in the initramfs, and only on the unprefixed parameters in real root. If the parameter is false, behaviour is the same as before. Changes by caller: log.c (systemd.log_*): changed to accept rd-dot-prefix params pid1: no change, custom logic cryptsetup-generator: no change, still accepts rd-dot-prefix params debug-generator: no change, does not accept rd-dot-prefix params fsck: changed to accept rd-dot-prefix params fstab-generator: no change, custom logic gpt-auto-generator: no change, custom logic hibernate-resume-generator: no change, does not accept rd-dot-prefix params journald: changed to accept rd-dot-prefix params modules-load: no change, still accepts rd-dot-prefix params quote-check: no change, does not accept rd-dot-prefix params udevd: no change, still accepts rd-dot-prefix params I added support for "rd." params in the three cases where I think it's useful: logging, fsck options, journald forwarding options.
2016-10-22tree-wide: allow state to be passed through to parse_proc_cmdline_itemZbigniew Jędrzejewski-Szmek
No functional change.
2016-10-21core: use emergency_action for ctr+alt+del burstLukas Nykryn
Fixes #4306
2016-10-21failure-action: generalize failure action to emergency actionLukas Nykryn
2016-10-21core: if the start command vanishes during runtime don't hit an assertLennart Poettering
This can happen when the configuration is changed and reloaded while we are executing a service. Let's not hit an assert in this case. Fixes: #4444
2016-10-20journald,core: add short comments we we keep reopening /dev/console all the timeLennart Poettering
Just to make sure the next one reading this isn't surprised that the fd isn't kept open. SAK and stuff... Fix suggested: https://github.com/systemd/systemd/pull/4366#issuecomment-253659162
2016-10-20Merge pull request #4417 from keszybz/man-and-rlimitLennart Poettering
Two unrelated patches: man page tweaks and rlimit log levels
2016-10-19pid1: downgrade some rlimit warningsZbigniew Jędrzejewski-Szmek
Since we ignore the result anyway, downgrade errors to warning. log_oom() will still emit an error, but that's mostly theoretical, so it is not worth complicating the code to avoid the small inconsistency
2016-10-19core: let's upgrade the log level for service processes dying of signal (#4415)Lennart Poettering
As suggested in https://github.com/systemd/systemd/pull/4367#issuecomment-253670328
2016-10-17core/exec: add a named-descriptor option ("fd") for streams (#4179)Luca Bruno
This commit adds a `fd` option to `StandardInput=`, `StandardOutput=` and `StandardError=` properties in order to connect standard streams to externally named descriptors provided by some socket units. This option looks for a file descriptor named as the corresponding stream. Custom names can be specified, separated by a colon. If multiple name-matches exist, the first matching fd will be used.
2016-10-17Merge pull request #4392 from keszybz/running-timersLennart Poettering
Fix for display of elapsed timers
2016-10-17core/timer: reset next_elapse_*time when timer is not waitingZbigniew Jędrzejewski-Szmek
When the unit that is triggered by a timer is started and running, we transition to "running" state, and the timer will not elapse again until the unit has finished running. In this state "systemctl list-timers" would display the previously calculated next elapse time, which would now of course be in the past, leading to nonsensical values. Simply set the next elapse to infinity, which causes list-timers to show n/a. We cannot specify when the next elapse will happen, possibly never. Fixes #4031.
2016-10-17pid1: do not use mtime==0 as sign of masking (#4388)Zbigniew Jędrzejewski-Szmek
It is allowed for unit files to have an mtime==0, so instead of assuming that any file that had mtime==0 was masked, use the load_state to filter masked units. Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1384150.
2016-10-16tree-wide: introduce free_and_replace helperZbigniew Jędrzejewski-Szmek
It's a common pattern, so add a helper for it. A macro is necessary because a function that takes a pointer to a pointer would be type specific, similarly to cleanup functions. Seems better to use a macro.
2016-10-16tree-wide: use mfree moreZbigniew Jędrzejewski-Szmek
2016-10-14core: make settings for unified cgroup hierarchy supersede the ones for ↵Tejun Heo
legacy hierarchy (#4269) There are overlapping control group resource settings for the unified and legacy hierarchies. To help transition, the settings are translated back and forth. When both versions of a given setting are present, the one matching the cgroup hierarchy type in use is used. Unfortunately, this is more confusing to use and document than necessary because there is no clear static precedence. Update the translation logic so that the settings for the unified hierarchy are always preferred. systemd.resource-control man page is updated to reflect the change and reorganized so that the deprecated settings are at the end in its own section.
2016-10-12core: make sure to dump ProtectKernelModules= valueDjalal Harouni
2016-10-12core: check protect_kernel_modules and private_devices in order to setup NNPDjalal Harouni
2016-10-12core:sandbox: lets make /lib/modules/ inaccessible on ProtectKernelModules=Djalal Harouni
Lets go further and make /lib/modules/ inaccessible for services that do not have business with modules, this is a minor improvment but it may help on setups with custom modules and they are limited... in regard of kernel auto-load feature. This change introduce NameSpaceInfo struct which we may embed later inside ExecContext but for now lets just reduce the argument number to setup_namespace() and merge ProtectKernelModules feature.
2016-10-12core:sandbox: remove CAP_SYS_RAWIO on PrivateDevices=yesDjalal Harouni
The rawio system calls were filtered, but CAP_SYS_RAWIO allows to access raw data through /proc, ioctl and some other exotic system calls...
2016-10-12core:sandbox: Add ProtectKernelModules= optionDjalal Harouni
This is useful to turn off explicit module load and unload operations on modular kernels. This option removes CAP_SYS_MODULE from the capability bounding set for the unit, and installs a system call filter to block module system calls. This option will not prevent the kernel from loading modules using the module auto-load feature which is a system wide operation.
2016-10-12Allow block and char classes in DeviceAllow bus properties (#4353)Zbigniew Jędrzejewski-Szmek
Allowed paths are unified betwen the configuration file parses and the bus property checker. The biggest change is that the bus code now allows "block-" and "char-" classes. In addition, path_startswith("/dev") was used in the bus code, and startswith("/dev") was used in the config file code. It seems reasonable to use path_startswith() which allows a slightly broader class of strings. Fixes #3935.
2016-10-11core/main: get rid from excess check of ACTION_TEST (#4350)0xAX
If `--test` command line option was passed, the systemd set skip_setup to true during bootup. But after this we check again that arg_action is test or help and opens pager depends on result. We should skip setup in a case when `--test` is passed, but it is also safe to set skip_setup in a case of `--help`. So let's remove first check and move skip_setup = true to the second check.
2016-10-11core: chown() any TTY used for stdin, not just when StandardInput=tty is ↵Lennart Poettering
used (#4347) If stdin is supplied as an fd for transient units (using the StandardInputFileDescriptor pseudo-property for transient units), then we should also fix up the TTY ownership, not just when we opened the TTY ourselves. This simply drops the explicit is_terminal_input()-based check. Note that chown_terminal() internally does a much more appropriate isatty()-based check anyway, hence we can drop this without replacement. Fixes: #4260
2016-10-11Merge pull request #4067 from poettering/invocation-idZbigniew Jędrzejewski-Szmek
Add an "invocation ID" concept to the service manager
2016-10-10Merge pull request #4337 from poettering/exit-codeZbigniew Jędrzejewski-Szmek
Fix for #4275 and more
2016-10-10core: simplify if branches a bitLennart Poettering
We do the same thing in two branches, let's merge them. Let's also add an explanatory comment, while we are at it.
2016-10-10core: make use of IN_SET() in various places in mount.cLennart Poettering
2016-10-10core: when determining whether a process exit status is clean, consider ↵Lennart Poettering
whether it is a command or a daemon SIGTERM should be considered a clean exit code for daemons (i.e. long-running processes, as a daemon without SIGTERM handler may be shut down without issues via SIGTERM still) while it should not be considered a clean exit code for commands (i.e. short-running processes). Let's add two different clean checking modes for this, and use the right one at the appropriate places. Fixes: #4275
2016-10-10core: lower exit status "level" at one placeLennart Poettering
When we print information about PID 1's crashdump subprocess failing. In this case we *know* that we do not generate LSB exit codes, as it's basically PID 1 itself that exited there.
2016-10-10main: use strdup instead of free_and_strdup to initialize default unit (#4335)0xAX
Previously we've used free_and_strdup() to fill arg_default_unit with unit name, If we didn't pass default unit name through a kernel command line or command line arguments. But we can use just strdup() instead of free_and_strdup() for this, because we will start fill arg_default_unit only if it wasn't set before.
2016-10-10exit-status: kill is_clean_exit_lsb(), move logic to sysv-generatorLennart Poettering
Let's get rid of is_clean_exit_lsb(), let's move the logic for the special handling of the two LSB exit codes into the sysv-generator by writing out appropriate SuccessExitStatus= lines if the LSB header exists. This is not only semantically more correct, bug also fixes a bug as the code in service.c that chose between is_clean_exit_lsb() and is_clean_exit() based this check on whether a native unit files was available for the unit. However, that check was bogus since a long time, since the SysV generator was introduced and native SysV script support was removed from PID 1, as in that case a unit file always existed.
2016-10-10tree-wide: pass return value of make_null_stdio() to warning instead of ↵0xAX
errno (#4328) as @poettering suggested in the #4320
2016-10-09main: initialize default unit little later (#4321)0xAX
systemd fills arg_default_unit during startup with default.target value. But arg_default_unit may be overwritten in parse_argv() or parse_proc_cmdline_item(). Let's check value of arg_default_unit after calls of parse_argv() and parse_proc_cmdline_item() and fill it with default.target if it wasn't filled before. In this way we will not spend unnecessary time to for filling arg_default_unit with default.target.
2016-10-09tree-wide: print warning in a failure case of make_null_stdio() (#4320)0xAX
The make_null_stdio() may fail. Let's check its result and print warning message instead of keeping silence.
2016-10-07core: add "invocation ID" concept to service managerLennart Poettering
This adds a new invocation ID concept to the service manager. The invocation ID identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is generated each time a unit moves from and inactive to an activating or active state. The primary usecase for this concept is to connect the runtime data PID 1 maintains about a service with the offline data the journal stores about it. Previously we'd use the unit name plus start/stop times, which however is highly racy since the journal will generally process log data after the service already ended. The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel, except that it applies to an individual unit instead of the whole system. The invocation ID is passed to the activated processes as environment variable. It is additionally stored as extended attribute on the cgroup of the unit. The latter is used by journald to automatically retrieve it for each log logged message and attach it to the log entry. The environment variable is very easily accessible, even for unprivileged services. OTOH the extended attribute is only accessible to privileged processes (this is because cgroupfs only supports the "trusted." xattr namespace, not "user."). The environment variable may be altered by services, the extended attribute may not be, hence is the better choice for the journal. Note that reading the invocation ID off the extended attribute from journald is racy, similar to the way reading the unit name for a logging process is. This patch adds APIs to read the invocation ID to sd-id128: sd_id128_get_invocation() may be used in a similar fashion to sd_id128_get_boot(). PID1's own logging is updated to always include the invocation ID when it logs information about a unit. A new bus call GetUnitByInvocationID() is added that allows retrieving a bus path to a unit by its invocation ID. The bus path is built using the invocation ID, thus providing a path for referring to a unit that is valid only for the current runtime cycleof it. Outlook for the future: should the kernel eventually allow passing of cgroup information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we can alter the invocation ID to be generated as hash from that rather than entirely randomly. This way we can derive the invocation race-freely from the messages.
2016-10-07core: only warn on short reads on signal fdZbigniew Jędrzejewski-Szmek