Age | Commit message (Collapse) | Author |
|
Cgroup fixes.
|
|
make virtualization detection quieter, rework unit start limit logic, detect unit file drop-in changes correctly, fix autofs state propagation
|
|
Let's move the enforcement of the per-unit start limit from unit.c into the
type-specific files again. For unit types that know a concept of "result" codes
this allows us to hook up the start limit condition to it with an explicit
result code. Also, this makes sure that the state checks in clal like
service_start() may be done before the start limit is checked, as the start
limit really should be checked last, right before everything has been verified
to be in order.
The generic start limit logic is left in unit.c, but the invocation of it is
moved into the per-type files, in the various xyz_start() functions, so that
they may place the check at the right location.
Note that this change drops the enforcement entirely from device, slice, target
and scope units, since these unit types generally may not fail activation, or
may only be activated a single time. This is also documented now.
Note that restores the "start-limit-hit" result code that existed before
6bf0f408e4833152197fb38fb10a9989c89f3a59 already in the service code. However,
it's not introduced for all units that have a result code concept.
Fixes #3166.
|
|
This way it can be used in install.c in subsequent commit.
|
|
This way it can be used in install.c in subsequent commit.
|
|
unit_has_mask_realized() determines whether the specified unit has its cgroups
set up properly given the desired target_mask; however, on the unified
hierarchy, controllers need to be enabled explicitly for children and the mask
of enabled controllers can deviate from target_mask. Only considering
target_mask in unit_has_mask_realized() can lead to false positives and
skipping enabling the requested controllers.
This patch adds unit->cgroup_enabled_mask to track which controllers are
enabled and updates unit_has_mask_realized() to also consider enable_mask.
Signed-off-by: Tejun Heo <htejun@fb.com>
|
|
This adds a new GetProcesses() bus call to the Unit object which returns an
array consisting of all PIDs, their process names, as well as their full cgroup
paths. This is then used by "systemctl status" to show the per-unit process
tree.
This has the benefit that the client-side no longer needs to access the
cgroupfs directly to show the process tree of a unit. Instead, it now uses this
new API, which means it also works if -H or -M are used correctly, as the
information from the specific host is used, and not the one from the local
system.
Fixes: #2945
|
|
With this change the logic for placing transient unit files and drop-ins
generated via "systemctl set-property" is reworked.
The latter are now placed in the newly introduced "control" unit file
directory. The fomer are now placed in the "transient" unit file directory.
Note that the properties originally set when a transient unit was created will
be written to and stay in the transient unit file directory, while later
changes are done via drop-ins.
This is preparation for a later "systemctl revert" addition, where existing
drop-ins are flushed out, but the original transient definition is restored.
|
|
Remove some old cruft
|
|
This should be handled fine now by .dir-locals.el, so need to carry that
stuff in every file.
|
|
unit, not just services
This moves the StartLimitBurst=, StartLimitInterval=, StartLimitAction=, RebootArgument= from the [Service] section
into the [Unit] section of unit files, and thus support it in all unit types, not just in services.
This way we can enforce the start limit much earlier, in particular before testing the unit conditions, so that
repeated start-up failure due to failed conditions is also considered for the start limit logic.
For compatibility the four options may also be configured in the [Service] section still, but we only document them in
their new section [Unit].
This also renamed the socket unit failure code "service-failed-permanent" into "service-start-limit-hit" to express
more clearly what it is about, after all it's only triggered through the start limit being hit.
Finally, the code in busname_trigger_notify() and socket_trigger_notify() is altered to become more alike.
Fixes: #2467
|
|
events correctly
|
|
This adds a new timestamp field to the Unit struct, storing when the last low-level state change took place, and make
sure this is restored after a daemon reload. This new field is useful to allow restarting of per-state timers exactly
where they originally started.
|
|
If a mount unit is bound to a device, systemd tries to umount the
mount point, if it thinks the device has gone away.
Due to the uevent queue and inotify of /proc/self/mountinfo being two
different sources, systemd can never get the ordering reliably correct.
It can happen, that in the uevent queue ADD,REMOVE,ADD is queued
and an inotify of mountinfo (or libmount event) happend with the
device in question.
systemd cannot know, at which point of time the mount happend in the
ADD,REMOVE,ADD sequence.
The real ordering might have been ADD,REMOVE,ADD,mount
and systemd might think ADD,mount,REMOVE,ADD and would umount the
mountpoint.
A test script which triggered this behaviour is:
rm -f test-efi-disk.img
dd if=/dev/null of=test-efi-disk.img bs=1M seek=512 count=1
parted --script test-efi-disk.img \
"mklabel gpt" \
"mkpart ESP fat32 1MiB 511MiB" \
"set 1 boot on"
LOOP=$(losetup --show -f -P test-efi-disk.img)
udevadm settle
mkfs.vfat -F32 ${LOOP}p1
mkdir -p mnt
mount ${LOOP}p1 mnt
... <dostuffwith mnt>
Without the "udevadm settle" systemd unmounted mnt while the script was
operating on mnt.
Of course the question is, why there was a REMOVE in the first place,
but this is not part of this patch.
|
|
This is a continuation of the previous include sort patch, which
only sorted for .c files.
|
|
Lets introduce unit_is_pristine() that verifies whether a unit is
suitable to become a transient unit, by checking that it is no
referenced yet and has no data on disk assigned.
|
|
bool anymore
|
|
variety of fixes
|
|
Snapshots were never useful or used for anything. Many systemd
developers that I spoke to at systemd.conf2015, didn't even know they
existed, so it is fairly safe to assume that this type can be deleted
without harm.
The fundamental problem with snapshots is that the state of the system
is dynamic, devices come and go, users log in and out, timers fire...
and restoring all units to some state from the past would "undo"
those changes, which isn't really possible.
Tested by creating a snapshot, running the new binary, and checking
that the transition did not cause errors, and the snapshot is gone,
and snapshots cannot be created anymore.
New systemctl says:
Unknown operation snapshot.
Old systemctl says:
Failed to create snapshot: Support for snapshots has been removed.
IgnoreOnSnaphost settings are warned about and ignored:
Support for option IgnoreOnSnapshot= has been removed and it is ignored
http://lists.freedesktop.org/archives/systemd-devel/2015-November/034872.html
|
|
We can't handle errors of thisc all sanely anyway, and we never actually
return any errors from the unit type that implements the call. Hence,
let's make this void, in order to simplify things.
|
|
We cannot handle enumeration failures in a sensible way, hence let's try
hard to continue without making such failures fatal, and log about it
with precise error messages.
|
|
|
|
Let's use strjoina() rather than strjoin() for construct dbus match
strings.
Also, while we are at it, fix parameter ordering, so that our functions
always put the object first, like it is customary for OO-like
programming.
|
|
When starting a transient service, allow setting stdin/stdout/stderr fds
for it, by passing them in via the bus.
This also simplifies some of the serialization code for units.
|
|
Preparation to allow systemctl to query the list of unit states.
|
|
Add a new config directive called NetClass= to CGroup enabled units.
Allowed values are positive numbers for fix assignments and "auto" for
picking a free value automatically, for which we need to keep track of
dynamically assigned net class IDs of units. Introduce a hash table for
this, and also record the last ID that was given out, so the allocator
can start its search for the next 'hole' from there. This could
eventually be optimized with something like an irb.
The class IDs up to 65536 are considered reserved and won't be
assigned automatically by systemd. This barrier can be made a config
directive in the future.
Values set in unit files are stored in the CGroupContext of the
unit and considered read-only. The actually assigned number (which
may have been chosen dynamically) is stored in the unit itself and
is guaranteed to remain stable as long as the unit is active.
In the CGroup controller, set the configured CGroup net class to
net_cls.classid. Multiple unit may share the same net class ID,
and those which do are linked together.
|
|
This patch set adds full support the new unified cgroup hierarchy logic
of modern kernels.
A new kernel command line option "systemd.unified_cgroup_hierarchy=1" is
added. If specified the unified hierarchy is mounted to /sys/fs/cgroup
instead of a tmpfs. No further hierarchies are mounted. The kernel
command line option defaults to off. We can turn it on by default as
soon as the kernel's APIs regarding this are stabilized (but even then
downstream distros might want to turn this off, as this will break any
tools that access cgroupfs directly).
It is possibly to choose for each boot individually whether the unified
or the legacy hierarchy is used. nspawn will by default provide the
legacy hierarchy to containers if the host is using it, and the unified
otherwise. However it is possible to run containers with the unified
hierarchy on a legacy host and vice versa, by setting the
$UNIFIED_CGROUP_HIERARCHY environment variable for nspawn to 1 or 0,
respectively.
The unified hierarchy provides reliable cgroup empty notifications for
the first time, via inotify. To make use of this we maintain one
manager-wide inotify fd, and each cgroup to it.
This patch also removes cg_delete() which is unused now.
On kernel 4.2 only the "memory" controller is compatible with the
unified hierarchy, hence that's the only controller systemd exposes when
booted in unified heirarchy mode.
This introduces a new enum for enumerating supported controllers, plus a
related enum for the mask bits mapping to it. The core is changed to
make use of this everywhere.
This moves PID 1 into a new "init.scope" implicit scope unit in the root
slice. This is necessary since on the unified hierarchy cgroups may
either contain subgroups or processes but not both. PID 1 hence has to
move out of the root cgroup (strictly speaking the root cgroup is the
only one where processes and subgroups are still allowed, but in order
to support containers nicey, we move PID 1 into the new scope in all
cases.) This new unit is also used on legacy hierarchy setups. It's
actually pretty useful on all systems, as it can then be used to filter
journal messages coming from PID 1, and so on.
The root slice ("-.slice") is now implicitly created and started (and
does not require a unit file on disk anymore), since
that's where "init.scope" is located and the slice needs to be started
before the scope can.
To check whether we are in unified or legacy hierarchy mode we use
statfs() on /sys/fs/cgroup. If the .f_type field reports tmpfs we are in
legacy mode, if it reports cgroupfs we are in unified mode.
This patch set carefuly makes sure that cgls and cgtop continue to work
as desired.
When invoking nspawn as a service it will implicitly create two
subcgroups in the cgroup it is using, one to move the nspawn process
into, the other to move the actual container processes into. This is
done because of the requirement that cgroups may either contain
processes or other subgroups.
|
|
This adds a new call unit_set_slice(), and simplifies
unit_add_default_slice(). THis should make our code a bit more robust
and simpler.
|
|
|
|
Let's add a way to get the type-specific D-Bus interface of a unit from
either its type or name to src/basic/unit-name.[ch]. That way we can
share it with the client side, where it is useful in tools like cgls or
machinectl.
Also ports over machinectl to make use of this.
|
|
Currently, PID1 installs an unfiltered NameOwnerChanged signal match, and
dispatches the signals itself. This does not scale, as right now, PID1
wakes up every time a bus client connects.
To fix this, install individual matches once they are requested by
unit_watch_bus_name(), and remove the watches again through their slot in
unit_unwatch_bus_name().
If the bus is not available during unit_watch_bus_name(), just store
name in the 'watch_bus' hashmap, and let bus_setup_api() do the installing
later.
|
|
For instantaneous jobs (e.g. starting of targets, sockets, slices, or
Type=simple services) the log shows the job completion
before starting:
systemd[1]: Created slice -.slice.
systemd[1]: Starting -.slice.
systemd[1]: Created slice System Slice.
systemd[1]: Starting System Slice.
systemd[1]: Listening on Journal Audit Socket.
systemd[1]: Starting Journal Audit Socket.
systemd[1]: Reached target Timers.
systemd[1]: Starting Timers.
...
The reason is that the job completes before the ->start() method returns
and only then does unit_start() print the "Starting ..." message.
The same thing happens when stopping units.
Rather than fixing the order of the messages, let's just not emit the
Starting/Stopping message at all when the job completes instantaneously.
The job completion message is sufficient in this case.
|
|
No distro ships that old systemd versions anyway, hence let's drop
support for live-upgrades for them. Offline updates are still supported.
And live-upgrades will only lose the job queue, hence basically still
work...
|
|
This extends on bea355dac94e82697aa98e25d80ee4248263bf92, and extends
the ratelimiter to not only be used for StopWhenUnneeded=1 units but
also for units that have BindsTo= on a unit that is dead.
http://lists.freedesktop.org/archives/systemd-devel/2015-April/030224.html
|
|
|
|
Otherwise we might end up in an endless stop loop.
http://lists.freedesktop.org/archives/systemd-devel/2015-April/030224.html
|
|
The call is only used by the mount and automount unit types, but that's
already enough to consider it generic unit functionality, hence move it
out of mount.c and into unit.c.
|
|
This changes log_unit_info() (and friends) to take a real Unit* object
insted of just a unit name as parameter. The call will now prefix all
logged messages with the unit name, thus allowing the unit name to be
dropped from the various passed romat strings, simplifying invocations
drastically, and unifying log output across messages. Also, UNIT= vs.
USER_UNIT= is now derived from the Manager object attached to the Unit
object, instead of getpid(). This has the benefit of correcting the
field for --test runs.
Also contains a couple of other logging improvements:
- Drops a couple of strerror() invocations in favour of using %m.
- Not only .mount units now warn if a symlinks exist for the mount
point already, .automount units do that too, now.
- A few invocations of log_struct() that didn't actually pass any
additional structured data have been replaced by simpler invocations
of log_unit_info() and friends.
- For structured data a new LOG_UNIT_MESSAGE() macro has been added,
that works like LOG_MESSAGE() but prefixes the message with the unit
name. Similar, there's now LOG_LINK_MESSAGE() and
LOG_NETDEV_MESSAGE().
- For structured data new LOG_UNIT_ID(), LOG_LINK_INTERFACE(),
LOG_NETDEV_INTERFACE() macros have been added that generate the
necessary per object fields. The old log_unit_struct() call has been
removed in favour of these new macros used in raw log_struct()
invocations. In addition to removing one more function call this
allows generated structured log messages that contain two object
fields, as necessary for example for network interfaces that are
joined into another network interface, and whose messages shall be
indexed by both.
- The LOG_ERRNO() macro has been removed, in favour of
log_struct_errno(). The latter has the benefit of ensuring that %m in
format strings is properly resolved to the specified error number.
- A number of logging messages have been converted to use
log_unit_info() instead of log_info()
- The client code in sysv-generator no longer #includes core code from
src/core/.
- log_unit_full_errno() has been removed, log_unit_full() instead takes
an errno now, too.
- log_unit_info(), log_link_info(), log_netdev_info() and friends, now
avoid double evaluation of their parameters
|
|
Introduce a new call unit_type_supported() and make use of it
everywhere.
Also, drop Manager parameter from per-type supported method prototype.
|
|
Let's make sure that we don't enqueue triggering jobs for units before
those units are actually fully loaded.
http://lists.freedesktop.org/archives/systemd-devel/2015-April/031176.html
https://bugs.freedesktop.org/show_bug.cgi?id=88401
|
|
This reverts commit 6e392c9c45643d106673c6643ac8bf4e65da13c1.
We really shouldn't invent external state keeping hashmaps, if we can
keep this state in the units themselves.
|
|
|
|
Because the order of coldplugging is not defined, we can reference a
not-yet-coldplugged unit and read its state while it has not yet been
set to a meaningful value.
This way, already active units may get started again.
We fix this by deferring such actions until all units have been at
least somehow coldplugged.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=88401
|
|
This adds support for showing the accumulated consumed CPU time per-unit
in the "systemctl status" output. The property is also readable via the
bus.
|
|
This patch removes includes that are not used. The removals were found with
include-what-you-use which checks if any of the symbols from a header is
in use.
|
|
This fixes various issues found by globally reordering the include
sections of all .c files.
|
|
With this change it is possible to send file descriptors to PID 1, via
sd_pid_notify_with_fds() which PID 1 will store individually for each
service, and pass via the usual fd passing logic on next invocation.
This is useful for enable daemon reload schemes where daemons serialize
their state to /run, push their fds into PID 1 and terminate, restoring
their state on next start from the data in /run and passed in from PID
1.
The fds are kept by PID 1 as long as no POLLHUP or POLLERR is seen on
them, and the service they belong to are either not dead or failed, or
have a job queued.
|
|
Containers do not really support .device, .automount or .swap units;
Systems compiled without support for swap do not support .swap units;
Systems without kdbus do not support .busname units.
With this change attempts to start a unsupported unit types will result
in an immediate "unsupported" job result, which is a lot more
descriptive then before. Also, attempts to start device units in
containers will now immediately fail instead of causing jobs to be
enqueued that never go away.
|
|
|
|
|