summaryrefslogtreecommitdiff
path: root/src/core
AgeCommit message (Collapse)Author
2016-05-18core: update CGroupBlockIODeviceBandwidth to record both rbps and wbpsTejun Heo
CGroupBlockIODeviceBandwith is used to keep track of IO bandwidth limits for legacy cgroup hierarchies. Unlike the unified hierarchy counterpart CGroupIODeviceLimit, a CGroupBlockIODeviceBandwiddth records either a read or write limit and has a couple issues. * There's no way to clear specific config entry. * When configs are cleared for an IO direction of a unit, the kernel settings aren't cleared accordingly creating discrepancies. This patch updates CGroupBlockIODeviceBandwidth so that it behaves similarly to CGroupIODeviceLimit - each entry records both rbps and wbps limits and is cleared if both are at default values after kernel settings are updated.
2016-05-18core: add support for IOReadIOPSMax and IOWriteIOPSMaxTejun Heo
cgroup IO controller supports maximum limits for both bandwidth and IOPS but systemd resource control currently only supports bandwidth limits. This patch adds support for IOReadIOPSMax and IOWriteIOPSMax when unified cgroup hierarchy is in use. It isn't difficult to also add BlockIOReadIOPS and BlockIOWriteIOPS for legacy hierarchies but IO control on legacy hierarchies is half-broken anyway, so let's leave it alone for now.
2016-05-18core: introduce CGroupIOLimitType enumsTejun Heo
Currently, there are two cgroup IO limits, bandwidth max for read and write, and they are hard-coded in various places. This is fine for two limits but IO is expected to grow more limits - low, high and max limits for bandwidth and IOPS - and hard-coding each limit won't make sense. This patch replaces hard-coded limits with an array indexed by CGroupIOLimitType and accompanying string and default value tables so that new limits can be added trivially.
2016-05-17core/dbus: use free_and_strdup to simplify code (#3279)Jonathan Boulle
Makes it consistent with the other branches here.
2016-05-16Merge pull request #3193 from htejun/cgroup-io-controllerLennart Poettering
core: add io controller support on the unified hierarchy
2016-05-16core: don't log job status message in case job was effectively NOP (#3199)Michal Sekletar
We currently generate log message about unit being started even when unit was started already and job didn't do anything. This is because job was requested explicitly and hence became anchor job of the transaction thus we could not eliminate it. That is fine but, let's not pollute journal with useless log messages. $ systemctl start systemd-resolved $ systemctl start systemd-resolved $ systemctl start systemd-resolved Current state: $ journalctl -u systemd-resolved | grep Started May 05 15:31:42 rawhide systemd[1]: Started Network Name Resolution. May 05 15:31:59 rawhide systemd[1]: Started Network Name Resolution. May 05 15:32:01 rawhide systemd[1]: Started Network Name Resolution. After patch applied: $ journalctl -u systemd-resolved | grep Started May 05 16:42:12 rawhide systemd[1]: Started Network Name Resolution. Fixes #1723
2016-05-15namespace: Make private /dev noexec and readonly (#3263)topimiettinen
Private /dev will not be managed by udev or others, so we can make it noexec and readonly after we have made all device nodes. As /dev/shm needs to be writable, we can't use bind_remount_recursive().
2016-05-14core: allow slice to be overriden if cgroups aren't realized (#3246)Tejun Heo
unit_set_slice() fails with -EBUSY if the unit already has a slice associated with it. This makes it impossible to override slice through dropin config or over dbus. There's no reason to disallow slice changes as long as cgroups aren't realized. Fix it. Fixes #3240. Signed-off-by: Tejun Heo <htejun@fb.com> Reported-by: Davide Cavalca <dcavalca@fb.com>
2016-05-14namespace: unmount old /dev under our new private /dev (#3254)topimiettinen
Drop all dangling old /dev mounts before mounting a new private /dev tree.
2016-05-12core: added ListUnitsByNames dbus method (#3182)kayrus
This new method returns information by unit names. Instead of ListUnitsByPatterns this method returns information of inactive and even unexisting units. Moved dbus unit reply logic into a separate shared function. Resolves https://github.com/coreos/fleet/pull/1418
2016-05-10Merge pull request #3220 from keszybz/install-fixesLennart Poettering
Fix "preset-all" with dangling symlinks and install-section hint emitted too eagerly
2016-05-09tree-wide: port more code to use ifname_valid()Lennart Poettering
2016-05-08Merge pull request #3202 from poettering/socket-fixesMartin Pitt
don't reopen socket fds when reloading the daemon
2016-05-07core/mount: add helper function for mount statesZbigniew Jędrzejewski-Szmek
2016-05-07Merge pull request #3160 from htejun/cgroup-fixes-rev2Zbigniew Jędrzejewski-Szmek
Cgroup fixes.
2016-05-07Merge pull request #3191 from poettering/cgroups-agent-dgramEvgeny Vereshchagin
core: use an AF_UNIX/SOCK_DGRAM socket for cgroup agent notification
2016-05-06core: dump TriggerLimitIntervalSec and TriggerLimitBurst tooEvgeny Vereshchagin
2016-05-06core: expose TriggerLimitIntervalUSecEvgeny Vereshchagin
Before: $ systemctl show --property TriggerLimitIntervalSec test.socket TriggerLimitIntervalSec=2000000 After: $ systemctl show --property TriggerLimitIntervalUSec test.socket TriggerLimitIntervalUSec=2s
2016-05-06core: update the right mtime after finishing writing of transient units (#3203)Lennart Poettering
Fixes: #3194
2016-05-06core: rework how we flush incoming traffic when a socket unit goes downLennart Poettering
Previously, we'd simply close and reopen the socket file descriptors. This is problematic however, as we won't transition through the SOCKET_CHOWN state then, and thus the file ownership won't be correct for the sockets. Rework the flushing logic, and actually read any queued data from the sockets for flushing, and accept any queued messages and disconnect them.
2016-05-06core: don't implicit open missing socket fds on daemon reloadLennart Poettering
Previously, when the daemon was reloaded and the configuration of a socket unit file was changed so that a different set of socket ports was defined for the socket we'd simply reopen the socket fds not yet open. This is problematic however, as this means the SOCKET_CHOWN state is not run for them, and thus their UID/GID is not corrected. With this change, don't open the missing file descriptors, but log about this issue, and ask the user to restart the socket explicit, to make sure all missing fds are opened. Fixes: #3171
2016-05-06core: split out selinux label retrieval logic into a function of its ownLennart Poettering
This should bring no behavioural change.
2016-05-05core: add io controller support on the unified hierarchyTejun Heo
On the unified hierarchy, blkio controller is renamed to io and the interface is changed significantly. * blkio.weight and blkio.weight_device are consolidated into io.weight which uses the standardized weight range [1, 10000] with 100 as the default value. * blkio.throttle.{read|write}_{bps|iops}_device are consolidated into io.max. Expansion of throttling features is being worked on to support work-conserving absolute limits (io.low and io.high). * All stats are consolidated into io.stats. This patchset adds support for the new interface. As the interface has been revamped and new features are expected to be added, it seems best to treat it as a separate controller rather than trying to expand the blkio settings although we might add automatic translation if only blkio settings are specified. * io.weight handling is mostly identical to blkio.weight[_device] handling except that the weight range is different. * Both read and write bandwidth settings are consolidated into CGroupIODeviceLimit which describes all limits applicable to the device. This makes it less painful to add new limits. * "max" can be used to specify the maximum limit which is equivalent to no config for max limits and treated as such. If a given CGroupIODeviceLimit doesn't contain any non-default configs, the config struct is discarded once the no limit config is applied to cgroup. * lookup_blkio_device() is renamed to lookup_block_device(). Signed-off-by: Tejun Heo <htejun@fb.com>
2016-05-05core: fix owner user/group output in socket dumpLennart Poettering
The unit file settings are called SocketUser= and SocketGroup= hence name these fields that way in the "systemd-analyze dump" output too. https://github.com/systemd/systemd/issues/3171#issuecomment-216216995
2016-05-05core: change default trigger limits for socket unitsLennart Poettering
Let's lower the default values a bit, and pick different defaults for Accept=yes and Accept=no sockets. Fixes: #3167
2016-05-05tree-wide: introduce new SOCKADDR_UN_LEN() macro, and use it everywhereLennart Poettering
The macro determines the right length of a AF_UNIX "struct sockaddr_un" to pass to connect() or bind(). It automatically figures out if the socket refers to an abstract namespace socket, or a socket in the file system, and properly handles the full length of the path field. This macro is not only safer, but also simpler to use, than the usual offsetof() + strlen() logic.
2016-05-05core: use an AF_UNIX/SOCK_DGRAM socket for cgroup agent notificationLennart Poettering
dbus-daemon currently uses a backlog of 30 on its D-bus system bus socket. On overloaded systems this means that only 30 connections may be queued without dbus-daemon processing them before further connection attempts fail. Our cgroups-agent binary so far used D-Bus for its messaging, and hitting this limit hence may result in us losing cgroup empty messages. This patch adds a seperate cgroup agent socket of type AF_UNIX/SOCK_DGRAM. Since sockets of these types need no connection set up, no listen() backlog applies. Our cgroup-agent binary will hence simply block as long as it can't enqueue its datagram message, so that we won't lose cgroup empty messages as likely anymore. This also rearranges the ordering of the processing of SIGCHLD signals, service notification messages (sd_notify()...) and the two types of cgroup notifications (inotify for the unified hierarchy support, and agent for the classic hierarchy support). We now always process events for these in the following order: 1. service notification messages (SD_EVENT_PRIORITY_NORMAL-7) 2. SIGCHLD signals (SD_EVENT_PRIORITY_NORMAL-6) 3. cgroup inotify and cgroup agent (SD_EVENT_PRIORITY_NORMAL-5) This is because when receiving SIGCHLD we invalidate PID information, which we need to process the service notification messages which are bound to PIDs. Hence the order between the first two items. And we want to process SIGCHLD metadata to detect whether a service is gone, before using cgroup notifications, to decide when a service is gone, since the former carries more useful metadata. Related to this: https://bugs.freedesktop.org/show_bug.cgi?id=95264 https://github.com/systemd/systemd/issues/1961
2016-05-04Merge pull request #3170 from poettering/v230-preparation-fixesLennart Poettering
make virtualization detection quieter, rework unit start limit logic, detect unit file drop-in changes correctly, fix autofs state propagation
2016-05-03Merge pull request #2921 from keszybz/do-not-report-masked-units-as-changedZbigniew Jędrzejewski-Szmek
2016-05-03Revert "Do not report masked units as changed (#2921)"Zbigniew Jędrzejewski-Szmek
This reverts commit 6d10d308c6cd16528ef58fa4f5822aef936862d3. It got squashed by mistake.
2016-05-02Merge pull request #3162 from keszybz/alias-refusalLennart Poettering
Refuse Alias, DefaultInstance, templated units in install (as appropriate)
2016-05-02automount: move resetting of expiry timeout to automount_set_state()Lennart Poettering
that way we can be sure that there's no expiry timeout in place at any time when we aren't in the RUNNING state.
2016-05-02automount: rework propagation between automount and mount unitsLennart Poettering
Port the progagation logic to the generic Unit->trigger_notify() callback logic in the unit vtable, that is called for a unit not only when the triggered unit of it changes state but also when a job for that unit finishes. This, firstly allows us to make the code a bit cleaner and more generic, but more importantly, allows us to notice correctly when a mount job fails, and propagate that back to autofs client processes. Fixes: #2181
2016-05-02core: don't propagate service state to sockets as long as there's still a ↵Lennart Poettering
job for the service queued
2016-05-02automount: add debug message when we get notified about mount state changesLennart Poettering
2016-05-02core: remove duplicate code in automount_update_mount()Lennart Poettering
Also, fix indentation.
2016-05-02core: simplify unit_need_daemon_reload() a bitLennart Poettering
And let's make it more accurate: if we have acquire the list of unit drop-ins, then let's do a full comparison against the old list we already have, and if things differ in any way, we know we have to reload. This makes sure we detect changes to drop-in directories in more cases.
2016-05-02core: fix detection whether per-unit drop-ins changedLennart Poettering
This fixes fall-out from 6d10d308c6cd16528ef58fa4f5822aef936862d3. Until that commit, do determine whether a daemon reload was required we compare the mtime of the main unit file we loaded with the mtime of it on disk for equality, but for drop-ins we only stored the newest mtime of all of them and then did a "newer-than" comparison. This was brokeni with the above commit, when all checks where changed to be for equality. With this change all checks are now done as "newer-than", fixing the drop-in mtime case. Strictly speaking this will not detect a number of changes that the code before above commit detected, but given that the mtime is unlikely to go backwards, and this is just intended to be a helpful hint anyway, this looks OK in order to keep things simple. Fixes: #3123
2016-05-02core: move enforcement of the start limit into per-unit-type code againLennart Poettering
Let's move the enforcement of the per-unit start limit from unit.c into the type-specific files again. For unit types that know a concept of "result" codes this allows us to hook up the start limit condition to it with an explicit result code. Also, this makes sure that the state checks in clal like service_start() may be done before the start limit is checked, as the start limit really should be checked last, right before everything has been verified to be in order. The generic start limit logic is left in unit.c, but the invocation of it is moved into the per-type files, in the various xyz_start() functions, so that they may place the check at the right location. Note that this change drops the enforcement entirely from device, slice, target and scope units, since these unit types generally may not fail activation, or may only be activated a single time. This is also documented now. Note that restores the "start-limit-hit" result code that existed before 6bf0f408e4833152197fb38fb10a9989c89f3a59 already in the service code. However, it's not introduced for all units that have a result code concept. Fixes #3166.
2016-05-01Move no_instances information to shared/Zbigniew Jędrzejewski-Szmek
This way it can be used in install.c in subsequent commit.
2016-05-01Move no_alias information to shared/Zbigniew Jędrzejewski-Szmek
This way it can be used in install.c in subsequent commit.
2016-04-30Merge pull request #3152 from poettering/aliasfixZbigniew Jędrzejewski-Szmek
Refuse aliases to non-aliasable units in more places Fixes #2730.
2016-04-30core: make unit_has_mask_realized() consider controller enable stateTejun Heo
unit_has_mask_realized() determines whether the specified unit has its cgroups set up properly given the desired target_mask; however, on the unified hierarchy, controllers need to be enabled explicitly for children and the mask of enabled controllers can deviate from target_mask. Only considering target_mask in unit_has_mask_realized() can lead to false positives and skipping enabling the requested controllers. This patch adds unit->cgroup_enabled_mask to track which controllers are enabled and updates unit_has_mask_realized() to also consider enable_mask. Signed-off-by: Tejun Heo <htejun@fb.com>
2016-04-29core: when encountering a symlink alias for non-aliasable units warn nicelyLennart Poettering
If the user defines a symlink alias for a unit whose type does not support aliasing, detect this early and print a nice warning. Fixe: #2730
2016-04-29core: refuse merging on units when the unit type does not support aliasLennart Poettering
The concept of merging units exists so that we can create Unit objects for a number of names early, and then load them only later, possibly merging units which then turn out to be symlinked to other names. This of course only makes sense for unit types where multiple names per unit are supported. For all others, let's refuse the merge operation early.
2016-04-29core: merge service_connection_unref() into service_close_socket_fd()Lennart Poettering
We always call one after the other anyway, and this way service_set_socket_fd() and service_close_socket_fd() nicely match each other as one undoes the effect of the other.
2016-04-29core: rerun GC logic for a unit that loses a referenceLennart Poettering
Let's make sure when we drop a reference to a unit, that we run the GC queue on it again. This (together with the previous commit) should deal with the GC issues pointed out in: https://github.com/systemd/systemd/pull/2993#issuecomment-215331189
2016-04-29core: rework socket/service GC logicLennart Poettering
There's no need to set the no_gc bit for service units that socket units prepare, as we always keep a proper reference (as maintained by unit_ref_set()) on them, and such references are honoured by the GC logic anyway. Moreover, explicitly setting the no_gc bit is problematic if the socket gets GC'ed for a reason, as the service might then leak with the bit set.
2016-04-29socket: really always close auxiliary fds when closing socket fdsLennart Poettering
2016-04-29core: make sure to close connection fd when we fail to activate a ↵Lennart Poettering
per-connection service Fixes: #2993 #2691