summaryrefslogtreecommitdiff
path: root/src/core
AgeCommit message (Collapse)Author
2016-10-21Revert "pid1: reconnect to the console before being re-executed"systemd/v231-3Zbigniew Jędrzejewski-Szmek
This reverts commit affd7ed1a923b0df8479cff1bd9eafb625fdaa66. > So it looks like make_console_stdio() has bad side effect. More specifically it > does a TIOCSCTTY ioctl (via acquire_terminal()) which sees to disturb the > process which was using/owning the console. Fixes #3842. https://bugs.debian.org/834367 https://bugzilla.redhat.com/show_bug.cgi?id=1367766 (cherry picked from commit bd64d82c1c0e3fe2a5f9b3dd9132d62834f50b2d)
2016-10-21pid1: don't return any error in manager_dispatch_notify_fd() (#4240)Franck Bui
If manager_dispatch_notify_fd() fails and returns an error then the handling of service notifications will be disabled entirely leading to a compromised system. For example pid1 won't be able to receive the WATCHDOG messages anymore and will kill all services supposed to send such messages. (cherry picked from commit 9987750e7a4c62e0eb8473603150596ba7c3a015)
2016-10-21pid1: process zero-length notification messages againZbigniew Jędrzejewski-Szmek
This undoes 531ac2b234. I acked that patch without looking at the code carefully enough. There are two problems: - we want to process the fds anyway - in principle empty notification messages are valid, and we should process them as usual, including logging using log_unit_debug(). (cherry picked from commit 8523bf7dd514a3a2c6114b7b8fb8f308b4f09fc4)
2016-10-21If the notification message length is 0, ignore the message (#4237)Jorge Niedbalski
Fixes #4234. Signed-off-by: Jorge Niedbalski <jnr@metaklass.org> (cherry picked from commit 531ac2b2349da02acc9c382849758e07eb92b020)
2016-07-25automount: don't cancel mount/umount request on reload/reexec (#3670)Michael Olbrich
All pending tokens are already serialized correctly and will be handled when the mount unit is done. Without this a 'daemon-reload' cancels all pending tokens. Any process waiting for the mount will continue with EHOSTDOWN. This can happen when the mount unit waits for it's dependencies, e.g. network, devices, fsck, etc.
2016-07-25transaction: don't cancel jobs for units with IgnoreOnIsolate=true (#3671)Michael Olbrich
This is important if a job was queued for a unit but not yet started. Without this, the job will be canceled and is never executed even though IgnoreOnIsolate it set to 'true'.
2016-07-25core: change ExecStart=! syntax to ExecStart=+ (#3797)Lennart Poettering
As suggested by @mbiebl we already use the "!" special char in unit file assignments for negation, hence we should not use it in a different context for privileged execution. Let's use "+" instead.
2016-07-22Merge pull request #3777 from poettering/id128-reworkZbigniew Jędrzejewski-Szmek
uuid/id128 code rework
2016-07-22Merge pull request #3753 from poettering/tasks-max-scaleLennart Poettering
Add support for relative TasksMax= specifications, and bump default for services
2016-07-22cgroup: whitelist inaccessible devices for "auto" and "closed" DevicePolicy.Alessandro Puccetti
https://github.com/systemd/systemd/pull/3685 introduced /run/systemd/inaccessible/{chr,blk} to map inacessible devices, this patch allows systemd running inside a nspawn container to create /run/systemd/inaccessible/{chr,blk}.
2016-07-22core: check for overflow when handling scaled MemoryLimit= settingsLennart Poettering
Just in case...
2016-07-22macros.systemd.in: add %systemd_ordering (#3776)Harald Hoyer
To remove the hard dependency on systemd, for packages, which function without a running systemd the %systemd_ordering macro can be used to ensure ordering in the rpm transaction. %systemd_ordering makes sure, the systemd rpm is installed prior to the package, so the %pre/%post scripts can execute the systemd parts. Installing systemd afterwards though, does not result in the same outcome.
2016-07-22core: change TasksMax= default for system services to 15%Lennart Poettering
As it turns out 512 is max number of tasks per service is hit by too many applications, hence let's bump it a bit, and make it relative to the system's maximum number of PIDs. With this change the new default is 15%. At the kernel's default pids_max value of 32768 this translates to 4915. At machined's default TasksMax= setting of 16384 this translates to 2457. Why 15%? Because it sounds like a round number and is close enough to 4096 which I was going for, i.e. an eight-fold increase over the old 512 Summary: | on the host | in a container old default | 512 | 512 new default | 4915 | 2457
2016-07-22main: simplify things a bit by moving container check into fixup_environment()Lennart Poettering
2016-07-22core: rename MemoryLimitByPhysicalMemory transient property to MemoryLimitScaleLennart Poettering
That way, we can neatly keep this in line with the new TasksMaxScale= option. Note that we didn't release a version with MemoryLimitByPhysicalMemory= yet, hence this change should be unproblematic without breaking API.
2016-07-22core: support percentage specifications on TasksMax=Lennart Poettering
This adds support for a TasksMax=40% syntax for specifying values relative to the system's configured maximum number of processes. This is useful in order to neatly subdivide the available room for tasks within containers.
2016-07-22core: rework machine-id-setup.c to use the calls from id128-util.[ch]Lennart Poettering
This allows us to delete quite a bit of code and make the whole thing a lot shorter.
2016-07-22main: make sure set_machine_id() doesn't clobber arg_machine_id on failureLennart Poettering
2016-07-22machine-id-setup: port machine_id_commit() to new id128-util.c APIsLennart Poettering
2016-07-22sd-id128: split UUID file read/write code into new id128-util.[ch]Lennart Poettering
We currently have code to read and write files containing UUIDs at various places. Unify this in id128-util.[ch], and move some other stuff there too. The new files are located in src/libsystemd/sd-id128/ (instead of src/shared/), because they are actually the backend of sd_id128_get_machine() and sd_id128_get_boot(). In follow-up patches we can use this reduce the code in nspawn and machine-id-setup by adopted the common implementation.
2016-07-22Merge pull request #3762 from poettering/sigkill-logMartin Pitt
log about all processes we forcibly kill
2016-07-22Merge pull request #3764 from poettering/assorted-stuff-2Martin Pitt
Assorted fixes
2016-07-21nspawn: enable major=0/minor=0 devices inside the container (#3773)Alessandro Puccetti
https://github.com/systemd/systemd/pull/3685 introduced /run/systemd/inaccessible/{chr,blk} to map inacessible devices, this patch allows systemd running inside a nspawn container to create /run/systemd/inaccessible/{chr,blk}.
2016-07-21core: remove duplicate includes (#3771)Thomas H. P. Andersen
2016-07-20namespace: fix wrong return value from mount(2) (#3758)Topi Miettinen
Fix bug introduced by #3263: mount(2) return value is 0 or -1, not errno. Thanks to Evgeny Vereshchagin (@evverx) for reporting.
2016-07-20execute: make sure JoinsNamespaceOf= doesn't leak ns fds to executed processesLennart Poettering
2016-07-20namespace: add a (void) castLennart Poettering
2016-07-20core: normalize header inclusion in execute.h a bitLennart Poettering
We don't actually need any functionality from cgroup.h in execute.h, hence don't include that. However, we do need the Unit structure from unit.h, hence include that, and move it as late as possible, since it needs the definitions from execute.h.
2016-07-20execute: normalize connect_logger_as() parameters slightlyLennart Poettering
All other functions in execute.c that need the unit id take a Unit* parameter as first argument. Let's change connect_logger_as() to follow a similar logic.
2016-07-20core: when a scope was abandoned, always log about processes we killLennart Poettering
After all, if a unit is abandoned, all processes inside of it may be considered "left over" and are something we should better log about.
2016-07-20core: make sure RequestStop signal is send directedLennart Poettering
This was accidentally left commented out for debugging purposes, let's fix that and make the signal directed again.
2016-07-20core: when forcibly killing/aborting left-over unit processes log about itLennart Poettering
Let's lot at LOG_NOTICE about any processes that we are going to SIGKILL/SIGABRT because clean termination of them didn't work. This turns the various boolean flag parameters to cg_kill(), cg_migrate() and related calls into a single binary flags parameter, simply because the function now gained even more parameters and the parameter listed shouldn't get too long. Logging for killing processes is done either when the kill signal is SIGABRT or SIGKILL, or on explicit request if KILL_TERMINATE_AND_LOG instead of LOG_TERMINATE is passed. This isn't used yet in this patch, but is made use of in a later patch.
2016-07-20namespace: minor improvementsLennart Poettering
We generally try to avoid strerror(), due to its threads-unsafety, let's do this here, too. Also, let's be tiny bit more explanatory with the log messages, and let's shorten a few things.
2016-07-20core: hide legacy bus propertiesLennart Poettering
We usually hide legacy bus properties from introspection. Let's do that for the InaccessibleDirectories= properties too. The properties stay accessible if requested, but they won't be listed anymore if people introspect the unit.
2016-07-19doc,core: Read{Write,Only}Paths= and InaccessiblePaths=Alessandro Puccetti
This patch renames Read{Write,Only}Directories= and InaccessibleDirectories= to Read{Write,Only}Paths= and InaccessiblePaths=, previous names are kept as aliases but they are not advertised in the documentation. Renamed variables: `read_write_dirs` --> `read_write_paths` `read_only_dirs` --> `read_only_paths` `inaccessible_dirs` --> `inaccessible_paths`
2016-07-19namespace: unify limit behavior on non-directory pathsAlessandro Puccetti
Despite the name, `Read{Write,Only}Directories=` already allows for regular file paths to be masked. This commit adds the same behavior to `InaccessibleDirectories=` and makes it explicit in the doc. This patch introduces `/run/systemd/inaccessible/{reg,dir,chr,blk,fifo,sock}` {dile,device}nodes and mounts on the appropriate one the paths specified in `InacessibleDirectories=`. Based on Luca's patch from https://github.com/systemd/systemd/pull/3327
2016-07-16manager: don't skip sigchld handler for main and control pid for services ↵Lukáš Nykrýn
(#3738) During stop when service has one "regular" pid one main pid and one control pid and the sighld for the regular one is processed first the unit_tidy_watch_pids will skip the main and control pid and does not remove them from u->pids(). But then we skip the sigchld event because we already did one in the iteration and there are two pids in u->pids. v2: Use general unit_main_pid() and unit_control_pid() instead of reaching directly to service structure.
2016-07-15tree-wide: get rid of selinux_context_t (#3732)Zbigniew Jędrzejewski-Szmek
https://github.com/SELinuxProject/selinux/commit/9eb9c9327563014ad6a807814e7975424642d5b9 deprecated selinux_context_t. Replace with a simple char* everywhere. Alternative fix for #3719.
2016-07-15macros: provide %_systemdgeneratordir and %_systemdusergeneratordir (#3672)Zbigniew Jędrzejewski-Szmek
... as requested in https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/DJ7HDNRM5JGBSA4HL3UWW5ZGLQDJ6Y7M/. Adding the macro makes it marginally easier to create generators for outside projects. I opted for "generatordir" and "usergeneratordir" to match %unitdir and %userunitdir. OTOH, "_systemd" prefix makes it obvious that this is related to systemd. "%_generatordir" would be to generic of a name.
2016-07-12shutdown: already sync IO before we enter the final killing spreeLennart Poettering
This way, slow IO journald has to wait for can't cause it to reach the killing spree timeout and is hit by SIGKILL in addition to SIGTERM.
2016-07-12shutdown: use 90s SIGKILL timeoutLennart Poettering
There's really no reason to use 10s here, let's instead default to 90s like we do for everything else. The SIGKILL during the final killing spree is in most regards the fourth level of a safety net, after all: any normal service should have already been stopped during the normal service shutdown logic, first via SIGTERM and then SIGKILL, and then also via SIGTERM during the finall killing spree before we send SIGKILL. And as a fourth level safety net it should only be required in exceptional cases, which means it's safe to rais the default timeout, as normal shutdowns should never be delayed by it. Note that journald excludes itself from the normal service shutdown, and relies on the final killing spree to terminate it (this is because it wants to cover the normal shutdown phase's complete logging). If the system's IO is excessively slow, then the 10s might not be enough for journald to sync everything to disk and logs might get lost during shutdown.
2016-07-12Various fixes for typos found by lintian (#3705)Michael Biebl
2016-07-12seccomp: only abort on syscall name resolution failures (#3701)Luca Bruno
seccomp_syscall_resolve_name() can return a mix of positive and negative (pseudo-) syscall numbers, while errors are signaled via __NR_SCMP_ERROR. This commit lets the syscall filter parser only abort on real parsing failures, letting libseccomp handle pseudo-syscall number on its own and allowing proper multiplexed syscalls filtering.
2016-07-11treewide: fix typos and remove accidental repetition of wordsTorstein Husebø
2016-07-08Merge pull request #3680 from joukewitteveen/pam-envEvgeny Vereshchagin
Follow up on #3503 (pass service env vars to PAM sessions)
2016-07-08execute: Do not alter call-by-ref parameter on failureJouke Witteveen
Prevent free from being called on (a part of) the call-by-reference variable env when setup_pam fails.
2016-07-08core: queue loading transient units after setting their properties (#3676)David Michael
The unit load queue can be processed in the middle of setting the unit's properties, so its load_state would no longer be UNIT_STUB for the check in bus_unit_set_properties(), which would cause it to incorrectly return an error.
2016-07-07cgroup: fix memory cgroup limit regression on kernel 3.10 (#3673)Daniel Mack
Commit da4d897e ("core: add cgroup memory controller support on the unified hierarchy (#3315)") changed the code in src/core/cgroup.c to always write the real numeric value from the cgroup parameters to the "memory.limit_in_bytes" attribute file. For parameters set to CGROUP_LIMIT_MAX, this results in the string "18446744073709551615" being written into that file, which is UINT64_MAX. Before that commit, CGROUP_LIMIT_MAX was special-cased to the string "-1". This causes a regression on CentOS 7, which is based on kernel 3.10, as the value is interpreted as *signed* 64 bit, and clamped to 0: [root@n54 ~]# echo 18446744073709551615 >/sys/fs/cgroup/memory/user.slice/memory.limit_in_bytes [root@n54 ~]# cat /sys/fs/cgroup/memory/user.slice/memory.limit_in_bytes 0 [root@n54 ~]# echo -1 >/sys/fs/cgroup/memory/user.slice/memory.limit_in_bytes [root@n54 ~]# cat /sys/fs/cgroup/memory/user.slice/memory.limit_in_bytes 9223372036854775807 Hence, all units that are subject to the limits enforced by the memory controller will crash immediately, even though they have no actual limit set. This happens to for the user.slice, for instance: [ 453.577153] Hardware name: SeaMicro SM15000-64-CC-AA-1Ox1/AMD Server CRB, BIOS Estoc.3.72.19.0018 08/19/2014 [ 453.587024] ffff880810c56780 00000000aae9501f ffff880813d7fcd0 ffffffff816360fc [ 453.594544] ffff880813d7fd60 ffffffff8163109c ffff88080ffc5000 ffff880813d7fd28 [ 453.602120] ffffffff00000202 fffeefff00000000 0000000000000001 ffff880810c56c03 [ 453.609680] Call Trace: [ 453.612156] [<ffffffff816360fc>] dump_stack+0x19/0x1b [ 453.617324] [<ffffffff8163109c>] dump_header+0x8e/0x214 [ 453.622671] [<ffffffff8116d20e>] oom_kill_process+0x24e/0x3b0 [ 453.628559] [<ffffffff81088dae>] ? has_capability_noaudit+0x1e/0x30 [ 453.634969] [<ffffffff811d4155>] mem_cgroup_oom_synchronize+0x575/0x5a0 [ 453.641721] [<ffffffff811d3520>] ? mem_cgroup_charge_common+0xc0/0xc0 [ 453.648299] [<ffffffff8116da84>] pagefault_out_of_memory+0x14/0x90 [ 453.654621] [<ffffffff8162f4cc>] mm_fault_error+0x68/0x12b [ 453.660233] [<ffffffff81642012>] __do_page_fault+0x3e2/0x450 [ 453.666017] [<ffffffff816420a3>] do_page_fault+0x23/0x80 [ 453.671467] [<ffffffff8163e308>] page_fault+0x28/0x30 [ 453.676656] Task in /user.slice/user-0.slice/user@0.service killed as a result of limit of /user.slice/user-0.slice/user@0.service [ 453.688477] memory: usage 0kB, limit 0kB, failcnt 7 [ 453.693391] memory+swap: usage 0kB, limit 9007199254740991kB, failcnt 0 [ 453.700039] kmem: usage 0kB, limit 9007199254740991kB, failcnt 0 [ 453.706076] Memory cgroup stats for /user.slice/user-0.slice/user@0.service: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB [ 453.725702] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [ 453.733614] [ 2837] 0 2837 11950 899 23 0 0 (systemd) [ 453.741919] Memory cgroup out of memory: Kill process 2837 ((systemd)) score 1 or sacrifice child [ 453.750831] Killed process 2837 ((systemd)) total-vm:47800kB, anon-rss:3188kB, file-rss:408kB Fix this issue by special-casing the UINT64_MAX case again.
2016-07-07execute: Cleanup the environment earlyJouke Witteveen
By cleaning up before setting up PAM we maintain control of overriding behavior in setting variables. Otherwise, pam_putenv is in control. This also makes sure we use a cleaned up environment in replacing variables in argv.
2016-07-01manager: Fixing a debug printf formatting mistake (#3640)Kyle Walker
A 'llu' formatting statement was used in a debugging printf statement instead of a 'PRIu64'. Correcting that mistake here.