summaryrefslogtreecommitdiff
path: root/src/core
AgeCommit message (Collapse)Author
2015-09-04core: split up manager_get_unit_by_pid()Lennart Poettering
Let's move the actual cgroup part of it into a new separate function manager_get_unit_by_pid_cgroup(), and then make manager_get_unit_by_pid() just a wrapper that also checks the two pid hashmaps. Then, let's make sure the various calls that want to deliver events to the owners of a PID check both hashmaps and the cgroup and deliver the event to *each* of them. OTOH make sure bus calls like GetUnitByPID() continue to check the PID hashmaps first and the cgroup only as fallback.
2015-09-04macro: introduce new PID_TO_PTR macros and make use of themLennart Poettering
This adds a new PID_TO_PTR() macro, plus PTR_TO_PID() and makes use of it wherever we maintain processes in a hash table. Previously we sometimes used LONG_TO_PTR() and other times ULONG_TO_PTR() for that, hence let's make this more explicit and clean up things.
2015-09-03Merge pull request #1123 from phomes/scope-no-bool-vs-intLennart Poettering
scope: do not compare a bool return with "<= 0"
2015-09-02tree-wide: fix indentationThomas Hindoe Paaboel Andersen
2015-09-02scope: do not compare a bool return with "<= 0"Thomas Hindoe Paaboel Andersen
2015-09-02Merge pull request #1116 from poettering/unified-rebasedLennart Poettering
core: unified cgroup hierarchy support
2015-09-01core: unified cgroup hierarchy supportLennart Poettering
This patch set adds full support the new unified cgroup hierarchy logic of modern kernels. A new kernel command line option "systemd.unified_cgroup_hierarchy=1" is added. If specified the unified hierarchy is mounted to /sys/fs/cgroup instead of a tmpfs. No further hierarchies are mounted. The kernel command line option defaults to off. We can turn it on by default as soon as the kernel's APIs regarding this are stabilized (but even then downstream distros might want to turn this off, as this will break any tools that access cgroupfs directly). It is possibly to choose for each boot individually whether the unified or the legacy hierarchy is used. nspawn will by default provide the legacy hierarchy to containers if the host is using it, and the unified otherwise. However it is possible to run containers with the unified hierarchy on a legacy host and vice versa, by setting the $UNIFIED_CGROUP_HIERARCHY environment variable for nspawn to 1 or 0, respectively. The unified hierarchy provides reliable cgroup empty notifications for the first time, via inotify. To make use of this we maintain one manager-wide inotify fd, and each cgroup to it. This patch also removes cg_delete() which is unused now. On kernel 4.2 only the "memory" controller is compatible with the unified hierarchy, hence that's the only controller systemd exposes when booted in unified heirarchy mode. This introduces a new enum for enumerating supported controllers, plus a related enum for the mask bits mapping to it. The core is changed to make use of this everywhere. This moves PID 1 into a new "init.scope" implicit scope unit in the root slice. This is necessary since on the unified hierarchy cgroups may either contain subgroups or processes but not both. PID 1 hence has to move out of the root cgroup (strictly speaking the root cgroup is the only one where processes and subgroups are still allowed, but in order to support containers nicey, we move PID 1 into the new scope in all cases.) This new unit is also used on legacy hierarchy setups. It's actually pretty useful on all systems, as it can then be used to filter journal messages coming from PID 1, and so on. The root slice ("-.slice") is now implicitly created and started (and does not require a unit file on disk anymore), since that's where "init.scope" is located and the slice needs to be started before the scope can. To check whether we are in unified or legacy hierarchy mode we use statfs() on /sys/fs/cgroup. If the .f_type field reports tmpfs we are in legacy mode, if it reports cgroupfs we are in unified mode. This patch set carefuly makes sure that cgls and cgtop continue to work as desired. When invoking nspawn as a service it will implicitly create two subcgroups in the cgroup it is using, one to move the nspawn process into, the other to move the actual container processes into. This is done because of the requirement that cgroups may either contain processes or other subgroups.
2015-09-01Merge pull request #1098 from filbranden/cpuaffinity2Lennart Poettering
Getting rid of FOREACH_WORD_QUOTED and some more cleanup in config_parse_cpu_affinity2
2015-09-01Merge pull request #1107 from msekletar/selinux-get-raw-contextLennart Poettering
selinux: always use *_raw API from libselinux
2015-09-01core: Log parse errors in config_parse_cpu_affinity2Filipe Brandenburger
2015-09-01Merge pull request #1111 from poettering/more-cgroup-fixesTom Gundersen
More cgroup fixes
2015-09-01core: rework when we kill with which signalLennart Poettering
When the user wants to explicitly send our own PID a signal, then do so. Don't follow up SIGABRT with a SIGHUP if send_sighup is enabled. At that point the process should have segfaulted, hence there's no point in following up with a SIGHUP. Send only termination signals to ourselves, never KILL or ABRT signals.
2015-09-01core: don't allow changing the slice of a unit while it is activeLennart Poettering
2015-09-01unit: small clean-upsLennart Poettering
Always say when we ignore errors. Cast calls whose return value we knowingly ingore to (void). Use "bool" where we actually mean a boolean, even if we return it as an int later on.
2015-09-01core: when looking for the unit for a process, look at the PID hashmaps firstLennart Poettering
It's cheaper that going to cgroupfs, and also usually the better choice since it's not racy and can map PIDs even if they were moved to a different unit.
2015-09-01cgroup: drop "ignore_self" argument from cg_is_empty()Lennart Poettering
In all cases where the function (or cg_is_empty_recursive()) ignoring the calling process is actually wrong, as a process keeps a cgroup busy regardless if its the current one or another. Hence, let's simplify things and drop the "ignore_self" parameter.
2015-09-01cgroup: small cleanups and coding style fixesLennart Poettering
A number of simplications and adjustments to brings things closer to our coding style.
2015-09-01units: enable waiting for unit termination in certain casesLennart Poettering
The legacy cgroup hierarchy does not support reliable empty notifications in containers and if there are left-over subgroups in a cgroup. This makes it hard to correctly wait for them running empty, and thus we previously disabled this logic entirely. With this change we explicitly check for the container case, and whether the unit is a "delegation" unit (i.e. one where programs may create their own subgroups). If we are neither in a container, nor operating on a delegation unit cgroup empty notifications become reliable and thus we start waiting for the empty notifications again. This doesn't really fix the general problem around cgroup notifications but reduces the effect around it. (This also reorders #include lines by their focus, as suggsted in CODING_STYLE. We have to add "virt.h", so let's do that at the right place.) Also see #317.
2015-09-01core: add OOM check in config_parse_join_controllersFilipe Brandenburger
2015-09-01core: Log parse errors in config_parse_join_controllersFilipe Brandenburger
2015-09-01unit: suppress unnecessary cgroup empty checkLennart Poettering
Rework the "service is good" check, to only check the cgroup state if we really need to instead of always. This allows us to suppress going to the cgroupfs for an empty check for the majority of services. No functional change.
2015-09-01manager: don't write first-boot flag file all the timeLennart Poettering
Instead, remember that we have already written it.
2015-09-01selinux: always use *_raw API from libselinuxMichal Sekletar
When mcstransd* is running non-raw functions will return translated SELinux context. Problem is that libselinux will cache this information and in the future it will return same context even though mcstransd maybe not running at that time. If you then check with such context against SELinux policy then selinux_check_access may fail depending on whether mcstransd is running or not. To workaround this problem/bug in libselinux, we should always get raw context instead. Most users will not notice because result of access check is logged only in debug mode. * SELinux context translation service, which will translates labels to human readable form
2015-08-31core: Use extract_first_word in config_parse_join_controllersFilipe Brandenburger
Related to the TODO item to replace FOREACH_WORD_QUOTED with it. Tested by setting `JoinControllers=cpu,cpuacct,memory net_cls,blkio' in /etc/systemd/system.conf, rebooting the system with the patched binaries and checking that the desired setup was created by inspecting the entries under /sys/fs/cgroup. No regressions observed in test cases.
2015-08-31util: Declare a cleanup routine for a cpu_set_tFilipe Brandenburger
Make use of it in config_parse_cpu_affinity2. Tested by tweaking the `CPUAffinity' setting in /etc/systemd/system.conf and reloading the daemon to confirm it is working as expected. No regressions observed in test cases.
2015-08-31core: Use extract_first_word in config_parse_cpu_affinity2Filipe Brandenburger
Related to the TODO item to replace FOREACH_WORD_QUOTED with it. Tested by setting `CPUAfinity=0 1' (and other similar settings) in /etc/systemd/system.conf, booting the system with the patched binaries (and also using `systemctl daemon-reload` to reconfigure) and checking that /proc/1/status indicates only CPUs 0 and 1 are allowed for PID 1. No regressions observed in test cases.
2015-08-31manager: remove ask-password fd from sd_event before closing itLennart Poettering
Otherwise we might attempt to remove a non-existing fd from epoll.
2015-08-31unit: minor simplificationLennart Poettering
2015-08-31unit: unify how we assing slices to unitsLennart Poettering
This adds a new call unit_set_slice(), and simplifies unit_add_default_slice(). THis should make our code a bit more robust and simpler.
2015-08-31unit: add new macros to test for unit contextsLennart Poettering
2015-08-31core: use DUAL_TIMESTAMP_NULL where we canLennart Poettering
2015-08-31core: don't generate stub unit file for transient unitsLennart Poettering
We store the properties for transient units in drop-ins anyway, and units don't have to have fragment files, hence don't bother with them, and don't create them.
2015-08-31socket: fix setsockopt call. SOL_SOCKET changed to SOL_TCP.Robin Hack
2015-08-30core: add attribute printf to null_log()Cristian Rodríguez
2015-08-28core: add unit_dbus_interface_from_type() to unit-name.hLennart Poettering
Let's add a way to get the type-specific D-Bus interface of a unit from either its type or name to src/basic/unit-name.[ch]. That way we can share it with the client side, where it is useful in tools like cgls or machinectl. Also ports over machinectl to make use of this.
2015-08-27Revert "sd-bus: do not connect to dbus-1 socket when kdbus is available"David Herrmann
This reverts commit d4d00020d6ad855d65d31020fefa5003e1bb477f. The idea of the commit is broken and needs to be reworked. We really cannot reduce the bus-addresses to a single address. We always will have systemd with native clients and legacy clients at the same time, so we also need both addresses at the same time.
2015-08-27selinux: drop mac_selinux_unit_access_check_strv()David Herrmann
It is not acceptable to load unit files during enable/disable operations just to figure out the selinux labels. systemd implements lazy loading for units, so the selinux hooks need to follow it. This drops the mac_selinux_unit_access_check_strv() helper which implements a non-acceptable policy check. If anyone cares for that functionality, you really should pass a callback+userdata to the helpers in src/shared/install.c which does policy checks on each touched file. See #1050 on github for more.
2015-08-26selinux: fix regression of systemctl subcommands when absolute unit file ↵HATAYAMA Daisuke
paths are specified The commit 4938696301a914ec26bcfc60bb99a1e9624e3789 overlooked the fact that unit files can be specified as unit file paths, not unit file names, wrongly passing a unit file path to the 1st argument of manager_load_unit() that handles it as a unit file name. As a result, the following 4 systemctl subcommands: enable disable reenable link mask unmask fail with the following error message: # systemctl enable /usr/lib/systemd/system/kdump.service Failed to execute operation: Unit name /usr/lib/systemd/system/kdump.service is not valid. # systemctl disable /usr/lib/systemd/system/kdump.service Failed to execute operation: Unit name /usr/lib/systemd/system/kdump.service is not valid. # systemctl reenable /usr/lib/systemd/system/kdump.service Failed to execute operation: Unit name /usr/lib/systemd/system/kdump.service is not valid. # cp /usr/lib/systemd/system/kdump.service /tmp/ # systemctl link /tmp/kdump.service Failed to execute operation: Unit name /tmp/kdump.service is not valid. # systemctl mask /usr/lib/systemd/system/kdump.service Failed to execute operation: Unit name /usr/lib/systemd/system/kdump.service is not valid. # systemctl unmask /usr/lib/systemd/system/kdump.service Failed to execute operation: Unit name /usr/lib/systemd/system/kdump.service is not valid. To fix the issue, first check whether a unit file is passed as a unit file name or a unit file path, and then pass the unit file to the appropreate argument of manager_load_unit(). By the way, even with this commit mask and unmask reject unit file paths as follows and this is a correct behavior: # systemctl mask /usr/lib/systemd/system/kdump.service Failed to execute operation: Invalid argument # systemctl unmask /usr/lib/systemd/system/kdump.service Failed to execute operation: Invalid argument
2015-08-25Merge pull request #1040 from poettering/cgroup-path-fixDaniel Mack
fix "systemctl status idontexist.service" showing the full cgroup tree
2015-08-25execute: make the invalid entry of the enum -1Thomas Hindoe Paaboel Andersen
Set _EXEC_UTMP_MODE_INVALID to -1. This matches the return value from string_table_lookup.
2015-08-25core: report root cgroup as "/" over the busLennart Poettering
Internally, the root cgroup is stored as the empty string in Unit.cgroup_path, and "no cgroup" as NULL. Unfortunately, D-Bus does not know a NULL concept, hence when reporting the cgroup to clients we should turn the root cgroup into "/", and leave the empty string for the "no cgroup" case. This should make sure that "systemctl status -- -.slice" works correctly and shows the entire cgroup tree.
2015-08-25core: drop spurious new lineLennart Poettering
2015-08-24core: open up more executable properties via the busLennart Poettering
This is preparation for a later commit that makes use of these properties for spawning an interactive shell in a container.
2015-08-24core: optionally create LOGIN_PROCESS or USER_PROCESS utmp entriesLennart Poettering
When generating utmp/wtmp entries, optionally add both LOGIN_PROCESS and INIT_PROCESS entries or even all three of LOGIN_PROCESS, INIT_PROCESS and USER_PROCESS entries, instead of just a single INIT_PROCESS entry. With this change systemd may be used to not only invoke a getty directly in a SysV-compliant way but alternatively also a login(1) implementation or even forego getty and login entirely, and invoke arbitrary shells in a way that they appear in who(1) or w(1). This is preparation for a later commit that adds a "machinectl shell" operation to invoke a shell in a container, in a way that is compatible with who(1) and w(1).
2015-08-21core: downgrade "Module inserted" message for kmod to DEBUGLennart Poettering
Closes #919.
2015-08-17Merge pull request #977 from richardmaw-codethink/machinectl-userns-login-v2Lennart Poettering
Fix machinectl login with containers in user namespaces (v2)
2015-08-17namespace helpers: Allow entering a UID namespaceRichard Maw
To be able to use `systemd-run` or `machinectl login` on a container that is in a private user namespace, the sub-process must have entered the user namespace before connecting to the container's D-Bus, otherwise the UID and GID in the peer credentials are garbage. So we extend namespace_open and namespace_enter to support UID namespaces, and we enter the UID namespace in bus_container_connect_{socket,kernel}. namespace_open will degrade to a no-op if user namespaces are not enabled in the kernel. Special handling is required for the setns call in namespace_enter with a user namespace, since transitioning to your own namespace is forbidden, as it would result in re-entering your user namespace as root. Arguably it may be valid to check this at the call site, rather than inside namespace_enter, but it is less code to do it inside, and if the intention of calling namespace_enter is to *be* in the target namespace, rather than to transition to the target namespace, it is a reasonable approach. The check for whether the user namespace is the same must happen before entering namespaces, as we may not be able to access /proc during the intermediate transition stage. We can't instead attempt to enter the user namespace and then ignore the failure from it being the same namespace, since the error code is not distinct, and we can't compare namespaces while mid-transition.
2015-08-17Bug #944: Deletion of unnecessary checks before a few calls of systemd functionsMarkus Elfring
The following functions return immediately if a null pointer was passed. * calendar_spec_free * link_address_free * manager_free * sd_bus_unref * sd_journal_close * udev_monitor_unref * udev_unref It is therefore not needed that a function caller repeats a corresponding check. This issue was fixed by using the software Coccinelle 1.0.1.
2015-08-16Merge pull request #908 from richardmaw-codethink/nspawn-path-escapes-v3Lennart Poettering
Allow arbitrary file paths to be passed to nspawn (v3)
2015-08-11 sd-bus: do not connect to dbus-1 socket when kdbus is availableKay Sievers
We should not fall back to dbus-1 and connect to the proxy when kdbus returns an error that indicates that kdbus is running but just does not accept new connections because of quota limits or something similar. Using is_kdbus_available() in libsystemd/ requires it to move from shared/ to libsystemd/. Based on a patch from David Herrmann: https://github.com/systemd/systemd/pull/886