summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2015-09-01core: unified cgroup hierarchy supportLennart Poettering
This patch set adds full support the new unified cgroup hierarchy logic of modern kernels. A new kernel command line option "systemd.unified_cgroup_hierarchy=1" is added. If specified the unified hierarchy is mounted to /sys/fs/cgroup instead of a tmpfs. No further hierarchies are mounted. The kernel command line option defaults to off. We can turn it on by default as soon as the kernel's APIs regarding this are stabilized (but even then downstream distros might want to turn this off, as this will break any tools that access cgroupfs directly). It is possibly to choose for each boot individually whether the unified or the legacy hierarchy is used. nspawn will by default provide the legacy hierarchy to containers if the host is using it, and the unified otherwise. However it is possible to run containers with the unified hierarchy on a legacy host and vice versa, by setting the $UNIFIED_CGROUP_HIERARCHY environment variable for nspawn to 1 or 0, respectively. The unified hierarchy provides reliable cgroup empty notifications for the first time, via inotify. To make use of this we maintain one manager-wide inotify fd, and each cgroup to it. This patch also removes cg_delete() which is unused now. On kernel 4.2 only the "memory" controller is compatible with the unified hierarchy, hence that's the only controller systemd exposes when booted in unified heirarchy mode. This introduces a new enum for enumerating supported controllers, plus a related enum for the mask bits mapping to it. The core is changed to make use of this everywhere. This moves PID 1 into a new "init.scope" implicit scope unit in the root slice. This is necessary since on the unified hierarchy cgroups may either contain subgroups or processes but not both. PID 1 hence has to move out of the root cgroup (strictly speaking the root cgroup is the only one where processes and subgroups are still allowed, but in order to support containers nicey, we move PID 1 into the new scope in all cases.) This new unit is also used on legacy hierarchy setups. It's actually pretty useful on all systems, as it can then be used to filter journal messages coming from PID 1, and so on. The root slice ("-.slice") is now implicitly created and started (and does not require a unit file on disk anymore), since that's where "init.scope" is located and the slice needs to be started before the scope can. To check whether we are in unified or legacy hierarchy mode we use statfs() on /sys/fs/cgroup. If the .f_type field reports tmpfs we are in legacy mode, if it reports cgroupfs we are in unified mode. This patch set carefuly makes sure that cgls and cgtop continue to work as desired. When invoking nspawn as a service it will implicitly create two subcgroups in the cgroup it is using, one to move the nspawn process into, the other to move the actual container processes into. This is done because of the requirement that cgroups may either contain processes or other subgroups.
2015-09-01Merge pull request #1107 from msekletar/selinux-get-raw-contextLennart Poettering
selinux: always use *_raw API from libselinux
2015-09-01Merge pull request #1111 from poettering/more-cgroup-fixesTom Gundersen
More cgroup fixes
2015-09-01Merge pull request #1099 from filbranden/joincontrollers2Lennart Poettering
Getting rid of FOREACH_WORD_QUOTED in config_parse_join_controllers
2015-09-01core: rework when we kill with which signalLennart Poettering
When the user wants to explicitly send our own PID a signal, then do so. Don't follow up SIGABRT with a SIGHUP if send_sighup is enabled. At that point the process should have segfaulted, hence there's no point in following up with a SIGHUP. Send only termination signals to ourselves, never KILL or ABRT signals.
2015-09-01core: don't allow changing the slice of a unit while it is activeLennart Poettering
2015-09-01unit: small clean-upsLennart Poettering
Always say when we ignore errors. Cast calls whose return value we knowingly ingore to (void). Use "bool" where we actually mean a boolean, even if we return it as an int later on.
2015-09-01core: when looking for the unit for a process, look at the PID hashmaps firstLennart Poettering
It's cheaper that going to cgroupfs, and also usually the better choice since it's not racy and can map PIDs even if they were moved to a different unit.
2015-09-01run: enable interactive authorizationEvgeny Vereshchagin
2015-09-01cgroup: the root cgroup is always populatedLennart Poettering
2015-09-01cgroup: drop "ignore_self" argument from cg_is_empty()Lennart Poettering
In all cases where the function (or cg_is_empty_recursive()) ignoring the calling process is actually wrong, as a process keeps a cgroup busy regardless if its the current one or another. Hence, let's simplify things and drop the "ignore_self" parameter.
2015-09-01cgroup: small cleanups and coding style fixesLennart Poettering
A number of simplications and adjustments to brings things closer to our coding style.
2015-09-01cgroup: don't allow hidden cgroupsLennart Poettering
We really should care for all cgroups, and not allow hidden ones.
2015-09-01cgroup: never migrate kernel threads out of the root cgroupLennart Poettering
It won't work anyway.
2015-09-01Merge pull request #1108 from phomes/dont-shadow-globalsDavid Herrmann
tree-wide: do not shadow the global var timezone
2015-09-01tree-wide: do not shadow the global var timezoneThomas Hindoe Paaboel Andersen
2015-09-01units: enable waiting for unit termination in certain casesLennart Poettering
The legacy cgroup hierarchy does not support reliable empty notifications in containers and if there are left-over subgroups in a cgroup. This makes it hard to correctly wait for them running empty, and thus we previously disabled this logic entirely. With this change we explicitly check for the container case, and whether the unit is a "delegation" unit (i.e. one where programs may create their own subgroups). If we are neither in a container, nor operating on a delegation unit cgroup empty notifications become reliable and thus we start waiting for the empty notifications again. This doesn't really fix the general problem around cgroup notifications but reduces the effect around it. (This also reorders #include lines by their focus, as suggsted in CODING_STYLE. We have to add "virt.h", so let's do that at the right place.) Also see #317.
2015-09-01core: add OOM check in config_parse_join_controllersFilipe Brandenburger
2015-09-01core: Log parse errors in config_parse_join_controllersFilipe Brandenburger
2015-09-01unit: suppress unnecessary cgroup empty checkLennart Poettering
Rework the "service is good" check, to only check the cgroup state if we really need to instead of always. This allows us to suppress going to the cgroupfs for an empty check for the majority of services. No functional change.
2015-09-01manager: don't write first-boot flag file all the timeLennart Poettering
Instead, remember that we have already written it.
2015-09-01sd-login: improve error handlingLennart Poettering
let's return ENXIO whenever we don't know something rather than ENOENT. ENOENT suggests this was really about a file or directory, while ENXIO is a more generic "not found" indicator.
2015-09-01cgtop: properly show "/" instead of empty string in cgroup listLennart Poettering
2015-09-01set: return NULL on destructorsLennart Poettering
Like we do it pretty much everywhere else.
2015-09-01selinux: always use *_raw API from libselinuxMichal Sekletar
When mcstransd* is running non-raw functions will return translated SELinux context. Problem is that libselinux will cache this information and in the future it will return same context even though mcstransd maybe not running at that time. If you then check with such context against SELinux policy then selinux_check_access may fail depending on whether mcstransd is running or not. To workaround this problem/bug in libselinux, we should always get raw context instead. Most users will not notice because result of access check is logged only in debug mode. * SELinux context translation service, which will translates labels to human readable form
2015-09-01Merge pull request #1066 from ssahani/tunnelLennart Poettering
networkd: add support for tunnel encap limit
2015-09-01logind: Listen to WMI hotkeys to catch SW_DOCK state/eventsMartin Pitt
On Dell and HP laptops the dock state/events (SW_DOCK) come from the "{Dell,HP} WMI hotkeys" input devices. Tag them as power-switch so that login actually considers them. Use a general match in case this affects other vendors, too. Thanks to Andreas Schultz for debugging this! https://launchpad.net/bugs/1450009
2015-08-31core: Use extract_first_word in config_parse_join_controllersFilipe Brandenburger
Related to the TODO item to replace FOREACH_WORD_QUOTED with it. Tested by setting `JoinControllers=cpu,cpuacct,memory net_cls,blkio' in /etc/systemd/system.conf, rebooting the system with the patched binaries and checking that the desired setup was created by inspecting the entries under /sys/fs/cgroup. No regressions observed in test cases.
2015-08-31Merge pull request #1097 from teg/dhcp-server-2David Herrmann
dhcp-server: make pool configurable
2015-08-31networkd: dhcp-server - allow configuration of the poolTom Gundersen
The constraints we place on the pool is that it is a contiguous sequence of addresses in the same subnet as the server address, not including the subnet nor broadcast addresses, but possibly including the server address itself. If the server address is included in the pool it is (obviously) reserved and not handed out to clients.
2015-08-31networkd: dhcp-server - default to manage the whole subnetTom Gundersen
Don't restrict yourselves to 32 leases, simply manage the whole subnet by default.
2015-08-31sd-dhcp-server: simplify pool creationTom Gundersen
Merge sd_dhcp_server_set_address() and sd_dhcp_server_set_lease_pool() into sd_dhcp_server_configure_pool() as the behavior of the two former depends on the order they are called in. The flexibility is not needed, so let's just do this in one call.
2015-08-31login: support user-bus on dbus1David Herrmann
dbus-1.10 was just released, including systemd units to run `dbus-daemon --session` as systemd user unit. This allows using a user-bus with dbus1, just like we do per default with kdbus. All the dbus libraries have already been fixed long ago to use the user-bus as default. Hence, there's no need to set DBUS_SESSION_BUS_ADDRESS= if we use the user-bus. However, gdm and friends continue to spawn a session bus if this variable is not set (instead of checking for the existence of the user-bus). Hence, we force the user-bus, if it is available, in pam_systemd. Once gdm and friends are fixed, we can continue to drop this again. However, that might take a while. With this in place, all that is needed to make the user-bus work is: `systemctl --global enable dbus.socket` If dbus.socket is not enabled, the legacy session-bus is still used. Based on a patch by: Jan Alexander Steffens <jan.steffens@gmail.com>
2015-08-31cgtop: rework error handlingLennart Poettering
Never report errors twice.
2015-08-31sd-event: improve debug message when we fail to remove and fd from an epollLennart Poettering
Let's help users to debug issues with epoll fd removal by printing the name of the event source.
2015-08-31cgls: pretty print root cgroup pathLennart Poettering
Make sure show it as "/" rather than empty string.
2015-08-31manager: remove ask-password fd from sd_event before closing itLennart Poettering
Otherwise we might attempt to remove a non-existing fd from epoll.
2015-08-31cgtop: allow toggling of --recursive= and -k at runtimeLennart Poettering
2015-08-31cgtop: recursively count cgroup member tasksLennart Poettering
When showing the number of tasks in a cgroup, recursively count tasks in child cgroups and include them in the number. This ensures that the number of tasks is cummulative the same way as memory, cpu and IO resources are. Old behaviour can be restored by passing the new --recursive=no switch.
2015-08-31cgtop: ignore kernel threads when counting tasksLennart Poettering
However, allow them to be counted in by specifying -k
2015-08-31cgls: print the expressive error message we haveLennart Poettering
2015-08-31process-util: trivial optimizationLennart Poettering
2015-08-31cgtop: show resource usage relative to cgroup root onlyLennart Poettering
This way the output is restricted to cgroups from a container when run in one.
2015-08-31unit: minor simplificationLennart Poettering
2015-08-31util: treat 'C' and 'POSIX' locale identicalLennart Poettering
2015-08-31pager: set $LESSCHARSET when we output UTF8 charsLennart Poettering
This way we can be sure that less has the same idea of the terminal as we do. This solves issues in systems that have locale uninitalized, where systemd would output UTF-8 but less wouldn't allow it and show them as control characters.
2015-08-31unit: unify how we assing slices to unitsLennart Poettering
This adds a new call unit_set_slice(), and simplifies unit_add_default_slice(). THis should make our code a bit more robust and simpler.
2015-08-31unit: add new macros to test for unit contextsLennart Poettering
2015-08-31core: use DUAL_TIMESTAMP_NULL where we canLennart Poettering
2015-08-31core: don't generate stub unit file for transient unitsLennart Poettering
We store the properties for transient units in drop-ins anyway, and units don't have to have fragment files, hence don't bother with them, and don't create them.