Age | Commit message (Collapse) | Author |
|
journald-related shutdown fixes for slow I/O
|
|
(#3724)
Signed-off-by: Andreas Pokorny <andreas.pokorny@canonical.com>
|
|
|
|
This way, slow IO journald has to wait for can't cause it to reach the killing
spree timeout and is hit by SIGKILL in addition to SIGTERM.
|
|
There's really no reason to use 10s here, let's instead default to 90s like we
do for everything else.
The SIGKILL during the final killing spree is in most regards the fourth level
of a safety net, after all: any normal service should have already been stopped
during the normal service shutdown logic, first via SIGTERM and then SIGKILL,
and then also via SIGTERM during the finall killing spree before we send
SIGKILL. And as a fourth level safety net it should only be required in
exceptional cases, which means it's safe to rais the default timeout, as normal
shutdowns should never be delayed by it.
Note that journald excludes itself from the normal service shutdown, and relies
on the final killing spree to terminate it (this is because it wants to cover
the normal shutdown phase's complete logging). If the system's IO is
excessively slow, then the 10s might not be enough for journald to sync
everything to disk and logs might get lost during shutdown.
|
|
|
|
seccomp_syscall_resolve_name() can return a mix of positive and negative
(pseudo-) syscall numbers, while errors are signaled via __NR_SCMP_ERROR.
This commit lets the syscall filter parser only abort on real parsing
failures, letting libseccomp handle pseudo-syscall number on its own
and allowing proper multiplexed syscalls filtering.
|
|
|
|
This is basically the same change as ea68351.
|
|
Follow up on #3503 (pass service env vars to PAM sessions)
|
|
Prevent free from being called on (a part of) the call-by-reference
variable env when setup_pam fails.
|
|
The unit load queue can be processed in the middle of setting the
unit's properties, so its load_state would no longer be UNIT_STUB
for the check in bus_unit_set_properties(), which would cause it to
incorrectly return an error.
|
|
Commit da4d897e ("core: add cgroup memory controller support on the unified
hierarchy (#3315)") changed the code in src/core/cgroup.c to always write
the real numeric value from the cgroup parameters to the
"memory.limit_in_bytes" attribute file.
For parameters set to CGROUP_LIMIT_MAX, this results in the string
"18446744073709551615" being written into that file, which is UINT64_MAX.
Before that commit, CGROUP_LIMIT_MAX was special-cased to the string "-1".
This causes a regression on CentOS 7, which is based on kernel 3.10, as the
value is interpreted as *signed* 64 bit, and clamped to 0:
[root@n54 ~]# echo 18446744073709551615 >/sys/fs/cgroup/memory/user.slice/memory.limit_in_bytes
[root@n54 ~]# cat /sys/fs/cgroup/memory/user.slice/memory.limit_in_bytes
0
[root@n54 ~]# echo -1 >/sys/fs/cgroup/memory/user.slice/memory.limit_in_bytes
[root@n54 ~]# cat /sys/fs/cgroup/memory/user.slice/memory.limit_in_bytes
9223372036854775807
Hence, all units that are subject to the limits enforced by the memory
controller will crash immediately, even though they have no actual limit
set. This happens to for the user.slice, for instance:
[ 453.577153] Hardware name: SeaMicro SM15000-64-CC-AA-1Ox1/AMD Server CRB, BIOS Estoc.3.72.19.0018 08/19/2014
[ 453.587024] ffff880810c56780 00000000aae9501f ffff880813d7fcd0 ffffffff816360fc
[ 453.594544] ffff880813d7fd60 ffffffff8163109c ffff88080ffc5000 ffff880813d7fd28
[ 453.602120] ffffffff00000202 fffeefff00000000 0000000000000001 ffff880810c56c03
[ 453.609680] Call Trace:
[ 453.612156] [<ffffffff816360fc>] dump_stack+0x19/0x1b
[ 453.617324] [<ffffffff8163109c>] dump_header+0x8e/0x214
[ 453.622671] [<ffffffff8116d20e>] oom_kill_process+0x24e/0x3b0
[ 453.628559] [<ffffffff81088dae>] ? has_capability_noaudit+0x1e/0x30
[ 453.634969] [<ffffffff811d4155>] mem_cgroup_oom_synchronize+0x575/0x5a0
[ 453.641721] [<ffffffff811d3520>] ? mem_cgroup_charge_common+0xc0/0xc0
[ 453.648299] [<ffffffff8116da84>] pagefault_out_of_memory+0x14/0x90
[ 453.654621] [<ffffffff8162f4cc>] mm_fault_error+0x68/0x12b
[ 453.660233] [<ffffffff81642012>] __do_page_fault+0x3e2/0x450
[ 453.666017] [<ffffffff816420a3>] do_page_fault+0x23/0x80
[ 453.671467] [<ffffffff8163e308>] page_fault+0x28/0x30
[ 453.676656] Task in /user.slice/user-0.slice/user@0.service killed as a result of limit of /user.slice/user-0.slice/user@0.service
[ 453.688477] memory: usage 0kB, limit 0kB, failcnt 7
[ 453.693391] memory+swap: usage 0kB, limit 9007199254740991kB, failcnt 0
[ 453.700039] kmem: usage 0kB, limit 9007199254740991kB, failcnt 0
[ 453.706076] Memory cgroup stats for /user.slice/user-0.slice/user@0.service: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
[ 453.725702] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[ 453.733614] [ 2837] 0 2837 11950 899 23 0 0 (systemd)
[ 453.741919] Memory cgroup out of memory: Kill process 2837 ((systemd)) score 1 or sacrifice child
[ 453.750831] Killed process 2837 ((systemd)) total-vm:47800kB, anon-rss:3188kB, file-rss:408kB
Fix this issue by special-casing the UINT64_MAX case again.
|
|
By cleaning up before setting up PAM we maintain control of overriding
behavior in setting variables. Otherwise, pam_putenv is in control.
This also makes sure we use a cleaned up environment in replacing
variables in argv.
|
|
Commit d054f0a4 ("tree-wide: use xsprintf() where applicable") used a
semantic patch approach to change a number of locations from
snprintf(buf, sizeof(buf), FMT, ...)
to
xsprintf(buf, FMT, ...)
The problem is that xsprintf() wraps the snprintf() in an
assert_message_se(), so if snprintf() reports an overflow of the
destination buffer, the binary will now terminate.
This hit a user running a version of systemd that was built from a
deeply nested system path.
Fix this by
a) Switching back to snprintf() for this particular case. We should really
rather truncate the location string than crash in such situations.
b) Increasing the size of that static string buffer, to make the event more
unlikely.
|
|
systemd-run --help says:
-E --setenv=NAME=VALUE Set environment
|
|
|
|
==1447== 4 bytes in 1 blocks are definitely lost in loss record 1 of 1
==1447== at 0x4C2BBAD: malloc (vg_replace_malloc.c:299)
==1447== by 0x5350F19: strdup (in /usr/lib64/libc-2.23.so)
==1447== by 0x4E9D435: strv_new_ap (strv.c:166)
==1447== by 0x4E9D5FA: strv_new (strv.c:199)
==1447== by 0x10E665: test_strv_fnmatch (test-strv.c:693)
==1447== by 0x10EAD5: main (test-strv.c:763)
==1447==
|
|
basic/fd-util: introduce stdio_unset_cloexec() function
|
|
|
|
There are some places in the systemd which are use the same pattern:
fd_cloexec(STDIN_FILENO, false);
fd_cloexec(STDOUT_FILENO, false);
fd_cloexec(STDERR_FILENO, false);
to unset CLOEXEC for standard file descriptors. This patch introduces
the stdio_unset_cloexec() function to hide this and make code cleaner.
|
|
Allow date and time ranges in OnCalendar
|
|
|
|
For backwards compatibility, both the new format (Mon..Wed) and
the old format (Mon-Wed) are supported.
|
|
Resolves #3042
|
|
A 'llu' formatting statement was used in a debugging printf statement
instead of a 'PRIu64'. Correcting that mistake here.
|
|
manager: Only invoke a single sigchld per unit within a cleanup cycle
|
|
* networkd: condition_test() can return a negative error, handle that
If a condition check fails with an error we should not consider the check
successful. Fix that.
We should probably also improve logging in this case, but for now, let's just
unbreak this breakage.
Fixes: #3236
* condition: handle unrecognized architectures nicer
When we encounter a check for an architecture we don't know we should not
let the condition check fail with an error code, but instead simply return
false. After all the architecture might just be newer than the ones we know, in
which case it's certainly not our local one.
Fixes: #3236
|
|
make "machinectl clean" asynchronous, and open it up via PolicyKit
|
|
sd_event_get_iteration() (#3631)
This extends the existing event loop iteration counter to 64bit, and exposes it
via a new function sd_event_get_iteration(). This is helpful for cases like
issue #3612. After all, since we maintain the counter anyway, we might as well
expose it.
(This also fixes an unrelated issue in the man page for sd_event_wait() where
micro and milliseconds got mixed up)
|
|
By default, each iteration of manager_dispatch_sigchld() results in a unit level
sigchld event being invoked. For scope units, this results in a scope_sigchld_event()
which can seemingly stall for workloads that have a large number of PIDs within the
scope. The stall exhibits itself as a SIG_0 being initiated for each u->pids entry
as a result of pid_is_unwaited().
v2:
This patch resolves this condition by only paying to cost of a sigchld in the underlying
scope unit once per sigchld iteration. A new "sigchldgen" member resides within the
Unit struct. The Manager is incremented via the sd event loop, accessed via
sd_event_get_iteration, and the Unit member is set to the same value as the manager each
time that a sigchld event is invoked. If the Manager iteration value and Unit member
match, the sigchld event is not invoked for that iteration.
|
|
sd-device: handle the 'drivers' pseudo-subsystem correctly
|
|
journalctl: Use env variable TMPDIR to save temporary files
|
|
This extends the existing event loop iteration counter to 64bit, and exposes it
via a new function sd_event_get_iteration(). This is helpful for cases like
issue #3612. After all, since we maintain the counter anyway, we might as well
expose it.
(This also fixes an unrelated issue in the man page for sd_event_wait() where
micro and milliseconds got mixed up)
|
|
build-sys: Convert libshared into a private shared library
|
|
Make `journalctl --directory=... --boot 0` work
|
|
The loop on bus_match_run should break and return immediately if
bus->match_callbacks_modified is true. Otherwise the loop may access
free'd data.
|
|
fixes https://bugzilla.redhat.com/show_bug.cgi?id=842060
|
|
--boot=0 magically meant "this boot", but when used with --file/--directory it
should simply refer to the last boot found in the specified journal. This way,
--boot and --list-boots are consistent.
Fixes #3603.
|
|
Those are just local variables and ref_boot_offset is especially
obnoxious.
|
|
Before --this-boot was deprecated in a331b5e6d47243, it did not take
any arguments.
|
|
It works mostly fine, and can be quite useful to examine data from another
system.
OTOH, a single boot id doesn't make sense with --merge, so mixing with --merge
is still not allowed.
|
|
“systemctl show” added an extra blank line after the dump of the
EnvironmentFile property of the unit.
|
|
With commit 6f7da49d00 route-only domains do not get put into resolv.conf's
"search" list any more. Add a comment about the tri-state, to clarify its
semantics and why we are passing a bool parameter into an int type. Also add a
test case for it.
|
|
to hide casting of '-1' strings and make code cleaner.
|
|
Fixes:
```
$ systemctl list-unit-files 'hey\*'
0 unit files listed.
$ systemctl list-unit-files | grep hey
hey\x7eho.service static
```
|
|
We support writing out tags and db files in case a real subsystem called
'drivers' exists, so there is no reason to refuse parsing it.
|
|
The 'drivers' pseudo-subsystem needs special treatment. These pseudo-devices are
found under /sys/bus/drivers/, so needs the real subsystem encoded
in the device_id in order to be resolved.
The reader side already assumed this to be the case.
|
|
Collect the errors and return to the caller, but continue enumerating all devices.
|
|
machinectl: interpret options placed between "shell" verb and machine name
|