summaryrefslogtreecommitdiff
path: root/src/core
AgeCommit message (Collapse)Author
2016-07-12shutdown: already sync IO before we enter the final killing spreeLennart Poettering
This way, slow IO journald has to wait for can't cause it to reach the killing spree timeout and is hit by SIGKILL in addition to SIGTERM.
2016-07-12shutdown: use 90s SIGKILL timeoutLennart Poettering
There's really no reason to use 10s here, let's instead default to 90s like we do for everything else. The SIGKILL during the final killing spree is in most regards the fourth level of a safety net, after all: any normal service should have already been stopped during the normal service shutdown logic, first via SIGTERM and then SIGKILL, and then also via SIGTERM during the finall killing spree before we send SIGKILL. And as a fourth level safety net it should only be required in exceptional cases, which means it's safe to rais the default timeout, as normal shutdowns should never be delayed by it. Note that journald excludes itself from the normal service shutdown, and relies on the final killing spree to terminate it (this is because it wants to cover the normal shutdown phase's complete logging). If the system's IO is excessively slow, then the 10s might not be enough for journald to sync everything to disk and logs might get lost during shutdown.
2016-07-12Various fixes for typos found by lintian (#3705)Michael Biebl
2016-07-12seccomp: only abort on syscall name resolution failures (#3701)Luca Bruno
seccomp_syscall_resolve_name() can return a mix of positive and negative (pseudo-) syscall numbers, while errors are signaled via __NR_SCMP_ERROR. This commit lets the syscall filter parser only abort on real parsing failures, letting libseccomp handle pseudo-syscall number on its own and allowing proper multiplexed syscalls filtering.
2016-07-11treewide: fix typos and remove accidental repetition of wordsTorstein Husebø
2016-07-08Merge pull request #3680 from joukewitteveen/pam-envEvgeny Vereshchagin
Follow up on #3503 (pass service env vars to PAM sessions)
2016-07-08execute: Do not alter call-by-ref parameter on failureJouke Witteveen
Prevent free from being called on (a part of) the call-by-reference variable env when setup_pam fails.
2016-07-08core: queue loading transient units after setting their properties (#3676)David Michael
The unit load queue can be processed in the middle of setting the unit's properties, so its load_state would no longer be UNIT_STUB for the check in bus_unit_set_properties(), which would cause it to incorrectly return an error.
2016-07-07cgroup: fix memory cgroup limit regression on kernel 3.10 (#3673)Daniel Mack
Commit da4d897e ("core: add cgroup memory controller support on the unified hierarchy (#3315)") changed the code in src/core/cgroup.c to always write the real numeric value from the cgroup parameters to the "memory.limit_in_bytes" attribute file. For parameters set to CGROUP_LIMIT_MAX, this results in the string "18446744073709551615" being written into that file, which is UINT64_MAX. Before that commit, CGROUP_LIMIT_MAX was special-cased to the string "-1". This causes a regression on CentOS 7, which is based on kernel 3.10, as the value is interpreted as *signed* 64 bit, and clamped to 0: [root@n54 ~]# echo 18446744073709551615 >/sys/fs/cgroup/memory/user.slice/memory.limit_in_bytes [root@n54 ~]# cat /sys/fs/cgroup/memory/user.slice/memory.limit_in_bytes 0 [root@n54 ~]# echo -1 >/sys/fs/cgroup/memory/user.slice/memory.limit_in_bytes [root@n54 ~]# cat /sys/fs/cgroup/memory/user.slice/memory.limit_in_bytes 9223372036854775807 Hence, all units that are subject to the limits enforced by the memory controller will crash immediately, even though they have no actual limit set. This happens to for the user.slice, for instance: [ 453.577153] Hardware name: SeaMicro SM15000-64-CC-AA-1Ox1/AMD Server CRB, BIOS Estoc.3.72.19.0018 08/19/2014 [ 453.587024] ffff880810c56780 00000000aae9501f ffff880813d7fcd0 ffffffff816360fc [ 453.594544] ffff880813d7fd60 ffffffff8163109c ffff88080ffc5000 ffff880813d7fd28 [ 453.602120] ffffffff00000202 fffeefff00000000 0000000000000001 ffff880810c56c03 [ 453.609680] Call Trace: [ 453.612156] [<ffffffff816360fc>] dump_stack+0x19/0x1b [ 453.617324] [<ffffffff8163109c>] dump_header+0x8e/0x214 [ 453.622671] [<ffffffff8116d20e>] oom_kill_process+0x24e/0x3b0 [ 453.628559] [<ffffffff81088dae>] ? has_capability_noaudit+0x1e/0x30 [ 453.634969] [<ffffffff811d4155>] mem_cgroup_oom_synchronize+0x575/0x5a0 [ 453.641721] [<ffffffff811d3520>] ? mem_cgroup_charge_common+0xc0/0xc0 [ 453.648299] [<ffffffff8116da84>] pagefault_out_of_memory+0x14/0x90 [ 453.654621] [<ffffffff8162f4cc>] mm_fault_error+0x68/0x12b [ 453.660233] [<ffffffff81642012>] __do_page_fault+0x3e2/0x450 [ 453.666017] [<ffffffff816420a3>] do_page_fault+0x23/0x80 [ 453.671467] [<ffffffff8163e308>] page_fault+0x28/0x30 [ 453.676656] Task in /user.slice/user-0.slice/user@0.service killed as a result of limit of /user.slice/user-0.slice/user@0.service [ 453.688477] memory: usage 0kB, limit 0kB, failcnt 7 [ 453.693391] memory+swap: usage 0kB, limit 9007199254740991kB, failcnt 0 [ 453.700039] kmem: usage 0kB, limit 9007199254740991kB, failcnt 0 [ 453.706076] Memory cgroup stats for /user.slice/user-0.slice/user@0.service: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB [ 453.725702] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [ 453.733614] [ 2837] 0 2837 11950 899 23 0 0 (systemd) [ 453.741919] Memory cgroup out of memory: Kill process 2837 ((systemd)) score 1 or sacrifice child [ 453.750831] Killed process 2837 ((systemd)) total-vm:47800kB, anon-rss:3188kB, file-rss:408kB Fix this issue by special-casing the UINT64_MAX case again.
2016-07-07execute: Cleanup the environment earlyJouke Witteveen
By cleaning up before setting up PAM we maintain control of overriding behavior in setting variables. Otherwise, pam_putenv is in control. This also makes sure we use a cleaned up environment in replacing variables in argv.
2016-07-01manager: Fixing a debug printf formatting mistake (#3640)Kyle Walker
A 'llu' formatting statement was used in a debugging printf statement instead of a 'PRIu64'. Correcting that mistake here.
2016-06-30Merge pull request #3634 from disneyworldguy/v2sigchldLennart Poettering
manager: Only invoke a single sigchld per unit within a cleanup cycle
2016-06-30Merge pull request #3596 from poettering/machine-cleanMartin Pitt
make "machinectl clean" asynchronous, and open it up via PolicyKit
2016-06-30manager: Only invoke a single sigchld per unit within a cleanup cycleKyle Walker
By default, each iteration of manager_dispatch_sigchld() results in a unit level sigchld event being invoked. For scope units, this results in a scope_sigchld_event() which can seemingly stall for workloads that have a large number of PIDs within the scope. The stall exhibits itself as a SIG_0 being initiated for each u->pids entry as a result of pid_is_unwaited(). v2: This patch resolves this condition by only paying to cost of a sigchld in the underlying scope unit once per sigchld iteration. A new "sigchldgen" member resides within the Unit struct. The Manager is incremented via the sd event loop, accessed via sd_event_get_iteration, and the Unit member is set to the same value as the manager each time that a sigchld event is invoked. If the Manager iteration value and Unit member match, the sigchld event is not invoked for that iteration.
2016-06-24pid1: restore console color support for containers (#3595)Franck Bui
Commit 3a18b60489504056f9b0b1a139439cbfa60a87e1 introduced a regression that disabled the color mode for container. This patch fixes this.
2016-06-24cgroup: minor coding style fixLennart Poettering
2016-06-23execute: add a new easy-to-use RestrictRealtime= option to unitsLennart Poettering
It takes a boolean value. If true, access to SCHED_RR, SCHED_FIFO and SCHED_DEADLINE is blocked, which my be used to lock up the system.
2016-06-23execute: be a little less drastic when MemoryDenyWriteExecute= hitsLennart Poettering
Let's politely refuse with EPERM rather than kill the whole thing right-away.
2016-06-23execute: set PR_SET_NO_NEW_PRIVS also in case the exec memory protection is usedLennart Poettering
This was forgotten when MemoryDenyWriteExecute= was added: we should set NNP in all cases when we set seccomp filters.
2016-06-23execute: use the return value of setrlimit_closest() properlyLennart Poettering
It's a function defined by us, hence we should look for the error in its return value, not in "errno".
2016-06-23core: when writing transient unit files, make sure all lines end with a newlineLennart Poettering
This is a fix-up for 2a9a6f8ac04a69ca36d645f9305a33645f22a22b which covered non-transient units, but missed the case for transient units.
2016-06-22watchdog: Support changing watchdog_usec during runtime (#3492)Minkyung
Add sd_notify() parameter to change watchdog_usec during runtime. Application can change watchdog_usec value by sd_notify like this. Example. sd_notify(0, "WATCHDOG_USEC=20000000"). To reset watchdog_usec as configured value in service file, restart service. Notice. sd_event is not currently supported. If application uses sd_event_set_watchdog, or sd_watchdog_enabled, do not use "WATCHDOG_USEC" option through sd_notify.
2016-06-22Merge pull request #3526 from fbuihuu/fix-console-log-colorLennart Poettering
Fix console log color
2016-06-22pid1: initialize status color mode after setting up TERMFranck Bui
Also we had to connect PID's stdio to null later since colors_enabled() assume that stdout is connected to the console.
2016-06-22pid1: initialize TERM environment variable correctlyFranck Bui
When systemd is started by the kernel, the kernel set the TERM environment variable unconditionnally to "linux" no matter the console device used. This might be an issue for dumb devices with no colors support. This patch uses default_term_for_tty() for getting a more accurate value. But it makes sure to keep the user preferences (if any) which might be passed via the kernel command line. For that purpose /proc should be mounted.
2016-06-20core: log the right set of the supported controllers (#3558)Evgeny Vereshchagin
Jun 16 05:12:08 systemd[1]: Controller 'io' supported: yes Jun 16 05:12:08 systemd[1]: Controller 'memory' supported: yes Jun 16 05:12:08 systemd[1]: Controller 'pids' supported: yes instead of Jun 16 04:06:50 systemd[1]: Controller 'memory' supported: yes Jun 16 04:06:50 systemd[1]: Controller 'devices' supported: yes Jun 16 04:06:50 systemd[1]: Controller 'pids' supported: yes
2016-06-20Revert "do not pass-along the environment from the kernel or initrd"Franck Bui
This reverts commit ce8aba568156f2b9d0d3b023e960cda3d9d7db81. We should pass an environment as close as possible to what we originally got.
2016-06-20pid1: reconnect to the console before being re-executedFranck Bui
When re-executed, reconnect the console to PID1's stdios as it was the case when PID1 was initially started by the kernel.
2016-06-18Ensure kdbus isn't used (#3501)Dave Reisner
Delete the dbus1 generator and some critical wiring. This prevents kdbus from being loaded or detected. As such, it will never be used, even if the user still has a useful kdbus module loaded on their system. Sort of fixes #3480. Not really, but it's better than the current state.
2016-06-16Merge pull request #3481 from poettering/relative-memcgLennart Poettering
various changes, most importantly regarding memory metrics
2016-06-15Merge pull request #3537 from poettering/journal-stream-envZbigniew Jędrzejewski-Szmek
Permit services to detect whether their stdout/stderr is connected to the journal.
2016-06-15load-fragment: ignore ENOTDIR/EACCES errors (#3510)Zbigniew Jędrzejewski-Szmek
If for whatever reason the file system is "corrupted", we want to be resilient and ignore the error, as long as we can load the units from a different place. Arch bug https://bugs.archlinux.org/task/49547. A user had an ntfs symlink (essentially a file) instead of a directory after restoring from backup. We should just ignore that like we would treat a missing directory, for general resiliency. We should treat permission errors similarly. For example an unreadable /usr/local/lib directory would prevent (user) instances of systemd from loading any units. It seems better to continue.
2016-06-15core: set $JOURNAL_STREAM to the dev_t/ino_t of the journal stream of ↵Lennart Poettering
executed services This permits services to detect whether their stdout/stderr is connected to the journal, and if so talk to the journal directly, thus permitting carrying of metadata. As requested by the gtk folks: #2473
2016-06-15execute: minor coding style improvementsLennart Poettering
2016-06-15tree-wide: htonl() is weird, let's use htobe32() instead (#3538)Lennart Poettering
Super-important change, yeah!
2016-06-14unit: properly comment generated comments in unit filesLennart Poettering
Fix-up for 2a9a6f8ac04a69ca36d645f9305a33645f22a22b
2016-06-14systemctl: allow percent-based MemoryLimit= settings via systemctl set-propertyLennart Poettering
The unit files already accept relative, percent-based memory limit specification, let's make sure "systemctl set-property" support this too. Since we want the physical memory size of the destination machine to apply we pass the percentage in a new set of properties that only exist for this purpose, and can only be set.
2016-06-14util: introduce physical_memory_scale() to unify how we scale by physical memoryLennart Poettering
The various bits of code did the scaling all different, let's unify this, given that the code is not trivial.
2016-06-14core: make sure to use "infinity" in unit files, not "max"Lennart Poettering
THe latter is a kernelism, we only understand "infinity".
2016-06-14core: when receiving a memory limit via the bus, refuse 0Lennart Poettering
When parsing unit files we already refuse unit memory limits of zero, let's also refuse it when the value is set via the bus.
2016-06-14core: optionally, accept a percentage value for MemoryLimit= and related ↵Lennart Poettering
settings If a percentage is used, it is taken relative to the installed RAM size. This should make it easier to write generic unit files that adapt to the local system.
2016-06-14util-lib: introduce parse_percent() for parsing percent specificationsLennart Poettering
And port a couple of users over to it.
2016-06-14manager: reduce complexity of unit_gc_sweep (#3507)Lukáš Nykrýn
When unit is marked as UNSURE, we are trying to find if it state was changed over and over again. So lets not go through the UNSURE states again. Also when we find a GOOD unit lets propagate the GOOD state to all units that this unit reference. This is a problem on machines with a lot of initscripts with different starting priority, since those units will reference each other and the original algorithm might get to n! complexity. Thanks HATAYAMA Daisuke for the expand_good_state code.
2016-06-14core: on unified we don't need to check u->pids: we can use proper ↵Evgeny Vereshchagin
notifications (#3531) Fixes: #3483
2016-06-13core: parse `rd.rescue` and `rd.emergency` as initrd-specific shorthands (#3488)Ivan Shapovalov
Typing `rd.rescue` is easier than `rd.systemd.unit=rescue.target`.
2016-06-13core/execute: pass env vars to PAM session setup (#3503)Jouke Witteveen
Move the merger of environment variables before setting up the PAM session and pass the aggregate environment to PAM setup. This allows control over the PAM session hooks through environment variables. PAM session initiation may update the environment. On successful initiation of a PAM session, we adopt the environment of the PAM context.
2016-06-10core: disable colors when displaying cylon when systemd.log_color=off (#3495)Franck Bui
2016-06-10core/execute: add the magic character '!' to allow privileged execution (#3493)Alessandro Puccetti
This patch implements the new magic character '!'. By putting '!' in front of a command, systemd executes it with full privileges ignoring paramters such as User, Group, SupplementaryGroups, CapabilityBoundingSet, AmbientCapabilities, SecureBits, SystemCallFilter, SELinuxContext, AppArmorProfile, SmackProcessLabel, and RestrictAddressFamilies. Fixes partially https://github.com/systemd/systemd/issues/3414 Related to https://github.com/coreos/rkt/issues/2482 Testing: 1. Create a user 'bob' 2. Create the unit file /etc/systemd/system/exec-perm.service (You can use the example below) 3. sudo systemctl start ext-perm.service 4. Verify that the commands starting with '!' were not executed as bob, 4.1 Looking to the output of ls -l /tmp/exec-perm 4.2 Each file contains the result of the id command. ````````````````````````````````````````````````````````````````` [Unit] Description=ext-perm [Service] Type=oneshot TimeoutStartSec=0 User=bob ExecStartPre=!/usr/bin/sh -c "/usr/bin/rm /tmp/exec-perm*" ; /usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-start-pre" ExecStart=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-start" ; !/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-star-2" ExecStartPost=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-start-post" ExecReload=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-reload" ExecStop=!/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-stop" ExecStopPost=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-stop-post" [Install] WantedBy=multi-user.target] `````````````````````````````````````````````````````````````````
2016-06-09load-fragment: don't try to do a template instance replacement if we are not ↵Lennart Poettering
an instance (#3451) Corrects: 7aad67e7 Fixes: #3438
2016-06-09execute: check whether the specified fd is a tty before chowning/chmoding ↵Lennart Poettering
it (#3457) Let's add an extra safety check before we chmod/chown a TTY to the right user, as we might end up having connected something to STDIN/STDOUT that is actually not a TTY, even though this might have been requested, due to permissive StandardInput= settings or transient service activation with fds passed in. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=85255