Age | Commit message (Collapse) | Author |
|
endocode/djalal/sandbox-first-protection-kernelmodules-v1
core:sandbox: Add ProtectKernelModules= and some fixes
|
|
Fixes: #4181
|
|
propagation
|
|
Lets go further and make /lib/modules/ inaccessible for services that do
not have business with modules, this is a minor improvment but it may
help on setups with custom modules and they are limited... in regard of
kernel auto-load feature.
This change introduce NameSpaceInfo struct which we may embed later
inside ExecContext but for now lets just reduce the argument number to
setup_namespace() and merge ProtectKernelModules feature.
|
|
|
|
This just adds capabilities test.
|
|
Add an "invocation ID" concept to the service manager
|
|
This is a bit crude and only works for new systemd versions which
have libsystemd-shared.
|
|
Let's make sure people invoking STRV_FOREACH_BACKWARDS() as a single statement
of an if statement don't fall into a trap, and find the tail for the list via
strv_length().
|
|
If the new item is inserted before the first item in the list, then the
head must be updated as well.
Add a test to the list unit test to check for this.
|
|
core:sandbox: Add new ProtectKernelTunables=, ProtectControlGroups=, ProtectSystem=strict and fixes
|
|
propagation
Better safe.
|
|
|
|
|
|
This adds logic to chase symlinks for all mount points that shall be created in
a namespace environment in userspace, instead of leaving this to the kernel.
This has the advantage that we can correctly handle absolute symlinks that
shall be taken relative to a specific root directory. Moreover, we can properly
handle mounts created on symlinked files or directories as we can merge their
mounts as necessary.
(This also drops the "done" flag in the namespace logic, which was never
actually working, but was supposed to permit a partial rollback of the
namespace logic, which however is only mildly useful as it wasn't clear in
which case it would or would not be able to roll back.)
Fixes: #3867
|
|
If a dir is marked to be inaccessible then everything below it should be masked
by it.
|
|
ProtectControlGroups=
If enabled, these will block write access to /sys, /proc/sys and
/proc/sys/fs/cgroup.
|
|
Also some trivial tests for STR_IN_SET and STRPTR_IN_SET.
|
|
|
|
Strerror removal and other janitorial cleanups
|
|
|
|
According to its manual page, flags given to mkostemp(3) shouldn't include
O_RDWR, O_CREAT or O_EXCL flags as these are always included. Beyond
those, the only flag that all callers (except a few tests where it
probably doesn't matter) use is O_CLOEXEC, so set that unconditionally.
|
|
A follow-up for #3818 (992e8f2).
|
|
One trailing dot is valid, but more than one isn't. This also fixes glibc's
posix/tst-getaddrinfo5 test.
Fixes #3978.
|
|
Fixes #3882
|
|
Trivial fixes to udev and condition tests
|
|
Our tests should test for OOM too explicitly, hence fix the test accordingly
|
|
Let's add one more test that came up during the discussion of an issue.
The selected name with 69 chars is above the Linux hostname limit of 64.
|
|
This adds the boolean RemoveIPC= setting to service, socket, mount and swap
units (i.e. all unit types that may invoke processes). if turned on, and the
unit's user/group is not root, all IPC objects of the user/group are removed
when the service is shut down. The life-cycle of the IPC objects is hence bound
to the unit life-cycle.
This is particularly relevant for units with dynamic users, as it is essential
that no objects owned by the dynamic users survive the service exiting. In
fact, this patch adds code to imply RemoveIPC= if DynamicUser= is set.
In order to communicate the UID/GID of an executed process back to PID 1 this
adds a new "user lookup" socket pair, that is inherited into the forked
processes, and closed before the exec(). This is needed since we cannot do NSS
from PID 1 due to deadlock risks, However need to know the used UID/GID in
order to clean up IPC owned by it if the unit shuts down.
|
|
It was meant to write to q instead of t
FAIL: test-id128
================
=================================================================
==125770==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffd4615bd31 at pc 0x7a2f41b1bf33 bp 0x7ffd4615b750 sp 0x7ffd4615b748
WRITE of size 1 at 0x7ffd4615bd31 thread T0
#0 0x7a2f41b1bf32 in id128_to_uuid_string src/libsystemd/sd-id128/id128-util.c:42
#1 0x401f73 in main src/test/test-id128.c:147
#2 0x7a2f41336341 in __libc_start_main (/lib64/libc.so.6+0x20341)
#3 0x401129 in _start (/home/crrodriguez/scm/systemd/.libs/test-id128+0x401129)
Address 0x7ffd4615bd31 is located in stack of thread T0 at offset 1409 in frame
#0 0x401205 in main src/test/test-id128.c:37
This frame has 23 object(s):
[32, 40) 'b'
[96, 112) 'id'
[160, 176) 'id2'
[224, 240) 'a'
[288, 304) 'b'
[352, 368) 'a'
[416, 432) 'b'
[480, 496) 'a'
[544, 560) 'b'
[608, 624) 'a'
[672, 688) 'b'
[736, 752) 'a'
[800, 816) 'b'
[864, 880) 'a'
[928, 944) 'b'
[992, 1008) 'a'
[1056, 1072) 'b'
[1120, 1136) 'a'
[1184, 1200) 'b'
[1248, 1264) 'a'
[1312, 1328) 'b'
[1376, 1409) 't' <== Memory access at offset 1409 overflows this variable
[1472, 1509) 'q'
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
(longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow src/libsystemd/sd-id128/id128-util.c:42 in id128_to_uuid_string
Shadow bytes around the buggy address:
0x100028c23750: f2 f2 00 00 f4 f4 f2 f2 f2 f2 00 00 f4 f4 f2 f2
0x100028c23760: f2 f2 00 00 f4 f4 f2 f2 f2 f2 00 00 f4 f4 f2 f2
0x100028c23770: f2 f2 00 00 f4 f4 f2 f2 f2 f2 00 00 f4 f4 f2 f2
0x100028c23780: f2 f2 00 00 f4 f4 f2 f2 f2 f2 00 00 f4 f4 f2 f2
0x100028c23790: f2 f2 00 00 f4 f4 f2 f2 f2 f2 00 00 f4 f4 f2 f2
=>0x100028c237a0: f2 f2 00 00 00 00[01]f4 f4 f4 f2 f2 f2 f2 00 00
0x100028c237b0: 00 00 05 f4 f4 f4 00 00 00 00 00 00 00 00 00 00
0x100028c237c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100028c237d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100028c237e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100028c237f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Heap right redzone: fb
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack partial redzone: f4
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==125770==ABORTING
FAIL test-id128 (exit status: 1)
|
|
ASAN is unable to handle it.
|
|
beef up /var/tmp and /tmp handling; set $SERVICE_RESULT/$EXIT_CODE/$EXIT_STATUS on ExecStop= and make sure root/nobody are always resolvable
|
|
This adds parse_nice() that parses a nice level and ensures it is in the right
range, via a new nice_is_valid() helper. It then ports over a number of users
to this.
No functional changes.
|
|
Beef up the existing var_tmp() call, rename it to var_tmp_dir() and add a
matching tmp_dir() call (the former looks for the place for /var/tmp, the
latter for /tmp).
Both calls check $TMPDIR, $TEMP, $TMP, following the algorithm Python3 uses.
All dirs are validated before use. secure_getenv() is used in order to limite
exposure in suid binaries.
This also ports a couple of users over to these new APIs.
The var_tmp() return parameter is changed from an allocated buffer the caller
will own to a const string either pointing into environ[], or into a static
const buffer. Given that environ[] is mostly considered constant (and this is
exposed in the very well-known getenv() call), this should be OK behaviour and
allows us to avoid memory allocations in most cases.
Note that $TMPDIR and friends override both /var/tmp and /tmp usage if set.
|
|
|
|
This patch improves parsing and generation of timestamps and calendar
specifications in two ways:
- The week day is now always printed in the abbreviated English form, instead
of the locale's setting. This makes sure we can always parse the week day
again, even if the locale is changed. Given that we don't follow locale
settings for printing timestamps in any other way either (for example, we
always use 24h syntax in order to make uniform parsing possible), it only
makes sense to also stick to a generic, non-localized form for the timestamp,
too.
- When parsing a timestamp, the local timezone (in its DST or non-DST name)
may be specified, in addition to "UTC". Other timezones are still not
supported however (not because we wouldn't want to, but mostly because libc
offers no nice API for that). In itself this brings no new features, however
it ensures that any locally formatted timestamp's timezone is also parsable
again.
These two changes ensure that the output of format_timestamp() may always be
passed to parse_timestamp() and results in the original input. The related
flavours for usec/UTC also work accordingly. Calendar specifications are
extended in a similar way.
The man page is updated accordingly, in particular this removes the claim that
timestamps systemd prints wouldn't be parsable by systemd. They are now.
The man page previously showed invalid timestamps as examples. This has been
removed, as the man page shouldn't be a unit test, where such negative examples
would be useful. The man page also no longer mentions the names of internal
functions, such as format_timestamp_us() or UNIX error codes such as EINVAL.
|
|
|
|
|
|
Depending on how binutils was configured and the --enable-fast-install
configure option, the test binary might be called either name.
Fixes: https://github.com/systemd/systemd/issues/3838
|
|
Private devices don't exist when running in a container, so skip the related
tests.
|
|
No point running tests against process 1 if systemd is not running as that
process. This is a rework of an unpublished patch by @9muir.
|
|
The condition tests for hostname will fail if hostname looks like an id128.
The test function attempts to convert hostname to an id128, and if that
succeeds compare it to the machine ID (presumably because the 'hostname'
condition test is overloaded to also test machine ID). That will typically
fail, and unfortunately the 'mock' utility generates a random hostname that
happens to have the same format as an id128, thus causing a test failure.
|
|
|
|
Accept both files with and without trailing newlines. Apparently some rkt
releases generated them incorrectly, missing the trailing newlines, and we
shouldn't break that.
|
|
User expectations are broken when "systemctl enable /some/path/service.service"
behaves differently to "systemctl link ..." followed by "systemctl enable".
From user's POV, "enable" with the full path just combines the two steps into
one.
Fixes #3010.
|
|
uuid/id128 code rework
|
|
This way we can reuse them for validating User=/Group= settings in unit files
(to be added in a later commit).
Also, add some tests for them.
|
|
This adds support for a TasksMax=40% syntax for specifying values relative to
the system's configured maximum number of processes. This is useful in order to
neatly subdivide the available room for tasks within containers.
|
|
We currently have code to read and write files containing UUIDs at various
places. Unify this in id128-util.[ch], and move some other stuff there too.
The new files are located in src/libsystemd/sd-id128/ (instead of src/shared/),
because they are actually the backend of sd_id128_get_machine() and
sd_id128_get_boot().
In follow-up patches we can use this reduce the code in nspawn and
machine-id-setup by adopted the common implementation.
|
|
Let's lot at LOG_NOTICE about any processes that we are going to
SIGKILL/SIGABRT because clean termination of them didn't work.
This turns the various boolean flag parameters to cg_kill(), cg_migrate() and
related calls into a single binary flags parameter, simply because the function
now gained even more parameters and the parameter listed shouldn't get too
long.
Logging for killing processes is done either when the kill signal is SIGABRT or
SIGKILL, or on explicit request if KILL_TERMINATE_AND_LOG instead of LOG_TERMINATE
is passed. This isn't used yet in this patch, but is made use of in a later
patch.
|