Age | Commit message (Collapse) | Author |
|
Various seccomp fixes and NEWS update.
|
|
|
|
Properly synthesize -.slice and init.scope
|
|
|
|
Since this unit is synthesized anyway there's no point in actually shipping it
on disk. This also has the benefit that "cd /usr/lib/systemd/system ; ls *"
won't be confused by the leading dash of the file name anymore.
|
|
callbacks
Previously, we'd synthesize the root slice unit and the init scope unit in the
enumerator callbacks for the unit type. This is problematic if either of them
is already referenced from a unit that is loaded as result of another unit
type's enumerator logic.
Let's clean this up and simply create the two objects from the enumerator
callbacks, if they are not around yet. Do the actual filling in of the settings
from the unit_load() callbacks, to match how other units are loaded.
Fixes: #4322
|
|
|
|
This validates the system call set table and many of our seccomp-util.c APIs.
|
|
This allows us to unify most of the code in apply_protect_kernel_modules() and
apply_private_devices().
|
|
"oldumount()" is not a syscall, but simply a wrapper for it, the actual syscall
nr is called "umount" (and the nr of umount() is called umount2 internally).
"sysctl()" is not a syscall, but "_syscall()" is. Fix this in the table.
Without these changes libseccomp cannot actually translate the tables in full.
This wasn't noticed before as the code was written defensively for this case.
|
|
This adds a new seccomp_init_conservative() helper call that is mostly just a
wrapper around seccomp_init(), but turns off NNP and adds in all secondary
archs, for best compatibility with everything else.
Pretty much all of our code used the very same constructs for these three
steps, hence unifying this in one small function makes things a lot shorter.
This also changes incorrect usage of the "scmp_filter_ctx" type at various
places. libseccomp defines it as typedef to "void*", i.e. it is a pointer type
(pretty poor choice already!) that casts implicitly to and from all other
pointer types (even poorer choice: you defined a confusing type now, and don't
even gain any bit of type safety through it...). A lot of the code assumed the
type would refer to a structure, and hence aded additional "*" here and there.
Remove that.
|
|
seccomp_add_syscall_filter_set()
Let's simplify this call, by making use of the new infrastructure.
This is actually more in line with Djalal's original patch but instead of
search the filter set in the array by its name we can now use the set index and
jump directly to it.
|
|
A variety of fixes:
- rename the SystemCallFilterSet structure to SyscallFilterSet. So far the main
instance of it (the syscall_filter_sets[] array) used to abbreviate
"SystemCall" as "Syscall". Let's stick to one of the two syntaxes, and not
mix and match too wildly. Let's pick the shorter name in this case, as it is
sufficiently well established to not confuse hackers reading this.
- Export explicit indexes into the syscall_filter_sets[] array via an enum.
This way, code that wants to make use of a specific filter set, can index it
directly via the enum, instead of having to search for it. This makes
apply_private_devices() in particular a lot simpler.
- Provide two new helper calls in seccomp-util.c: syscall_filter_set_find() to
find a set by its name, seccomp_add_syscall_filter_set() to add a set to a
seccomp object.
- Update SystemCallFilter= parser to use extract_first_word(). Let's work on
deprecating FOREACH_WORD_QUOTED().
- Simplify apply_private_devices() using this functionality
|
|
|
|
Let's prefer early-exit over deep-indented if blocks. Not behavioural change.
|
|
This is a follow-up for fb8b0869a7bc30e23be175cf978df23192d59118, and makes a
couple of minor clean-up changes:
- The field name in the timestamp file is changed from "TimestampNSec=" to
"TIMESTAMP_NSEC=". This is done simply to reflect the fact that we parse the
file with the env var file parser, and hence the contents should better
follow the usual capitalization of env vars, i.e. be all uppercase.
- Needless negation of the errno parameter log_error_errno() and friends has
been removed.
- Instead of manually calculating the nsec remainder of the timestamp, use
timespec_store().
- We now check whether we were able to write the timestamp file in full with
fflush_and_check() the way we usually do it.
|
|
Commandline parsing simplification and udev fix
|
|
test: lets add more tests to cover SupplementaryGroups= cases.
|
|
shared, systemctl: teach is-enabled to show install targets
|
|
When systemd-networkd is run on the same IPv6 enabled interface where
radvd is announcing prefixes, a route is being set up pointing to the
interface address. As this will fail with an invalid argument error,
the link is marked as failed and the following message like the
following will appear in in the logs:
systemd-networkd[21459]: eth1: Could not set NDisc route or address: Invalid argument
systemd-networkd[21459]: eth1: Failed
Should the interface be required by systemd-networkd-wait-online,
network-online.target will wait until its timeout hits thereby
significantly delaying system startup.
The fix is to check whether the gateway address obtained from NDisc
messages is equal to any of the interface addresses on the same link
and not set the NDisc route in that case.
|
|
Remove the assert and check the return code of sysconf(_SC_NGROUPS_MAX).
_SC_NGROUPS_MAX maps to NGROUPS_MAX which is defined in <limits.h> to
65536 these days. The value is a sysctl read-only
/proc/sys/kernel/ngroups_max and the kernel assumes that it is always
positive otherwise things may break. Follow this and support only
positive values for all other case return either -errno or -EOPNOTSUPP.
Now if there are systems that want to re-write NGROUPS_MAX then they
should not pass SupplementaryGroups= in units even if it is empty, in
this case nothing fails and we just ignore supplementary groups. However
if SupplementaryGroups= is passed even if it is empty we have to assume
that there will be groups manipulation from our side or the kernel and
since the kernel always assumes that NGROUPS_MAX is positive, then
follow that and support only positive values.
|
|
|
|
It may be desired by users to know what targets a particular service is
installed into. Improve user friendliness by teaching the is-enabled
command to show such information when used with --full.
This patch makes use of the newly added UnitFileFlags and adds
UNIT_FILE_DRY_RUN flag into it. Since the API had already been modified,
it's now easy to add the new dry-run feature for other commands as
well. As a next step, --dry-run could be added to systemctl, which in
turn might pave the way for a long requested dry-run feature when
running systemctl start.
|
|
Introduce a new enum to get rid of some boolean arguments of unit_file_*
functions. It unifies the code, makes it a bit cleaner and extensible.
|
|
|
|
https://github.com/systemd/systemd/issues/4352 has been fixed
So, we don't need this workaround anymore
|
|
https://github.com/torvalds/linux/commit/036d523641c66bef713042894a17f4335f199e49
> vfs: Don't create inodes with a uid or gid unknown to the vfs
It is expected that filesystems can not represent uids and gids from
outside of their user namespace. Keep things simple by not even
trying to create filesystem nodes with non-sense uids and gids.
So, we actually should `reset_uid_gid` early to prevent https://github.com/systemd/systemd/pull/4223#issuecomment-252522955
$ sudo UNIFIED_CGROUP_HIERARCHY=no LD_LIBRARY_PATH=.libs .libs/systemd-nspawn -D /var/lib/machines/fedora-rawhide -U -b systemd.unit=multi-user.target
Spawning container fedora-rawhide on /var/lib/machines/fedora-rawhide.
Press ^] three times within 1s to kill container.
Child died too early.
Selected user namespace base 1073283072 and range 65536.
Failed to mount to /sys/fs/cgroup/systemd: No such file or directory
Details: https://github.com/systemd/systemd/pull/4223#issuecomment-253046519
Fixes: #4352
|
|
https://github.com/systemd/systemd/pull/4372#issuecomment-253723849:
* `mount_all (outer_child)` creates `container_dir/sys/fs/selinux`
* `mount_all (outer_child)` doesn't patch `container_dir/sys/fs` and so on.
* `mount_sysfs (inner_child)` tries to create `/sys/fs/cgroup`
* This fails
370 stat("/sys/fs", {st_dev=makedev(0, 28), st_ino=13880, st_mode=S_IFDIR|0755, st_nlink=3, st_uid=65534, st_gid=65534, st_blksize=4096, st_blocks=0, st_size=60, st_atime=2016/10/14-05:16:43.398665943, st_mtime=2016/10/14-05:16:43.399665943, st_ctime=2016/10/14-05:16:43.399665943}) = 0
370 mkdir("/sys/fs/cgroup", 0755) = -1 EACCES (Permission denied)
* `mount_syfs (inner_child)` ignores that error and
mount(NULL, "/sys", NULL, MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0
* `mount_cgroups` finally fails
|
|
https://github.com/systemd/systemd/pull/4372#discussion_r83354107:
I get `open("/proc/self/fdinfo/13", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)`
327 mkdir("/proc", 0755 <unfinished ...>
327 <... mkdir resumed> ) = -1 EEXIST (File exists)
327 stat("/proc", <unfinished ...>
327 <... stat resumed> {st_dev=makedev(8, 1), st_ino=28585, st_mode=S_IFDIR|0755, st_nlink=2, st_uid=0, st_gid=0, st_blksize=1024, st_blocks=4, st_size=1024, st_atime=2016/10/14-02:55:32, st_mtime=2016/
327 mount("proc", "/proc", "proc", MS_NOSUID|MS_NODEV|MS_NOEXEC, NULL <unfinished ...>
327 <... mount resumed> ) = 0
327 lstat("/proc", <unfinished ...>
327 <... lstat resumed> {st_dev=makedev(0, 34), st_ino=1, st_mode=S_IFDIR|0555, st_nlink=75, st_uid=65534, st_gid=65534, st_blksize=1024, st_blocks=0, st_size=0, st_atime=2016/10/14-03:13:35.971031263,
327 lstat("/proc/sys", {st_dev=makedev(0, 34), st_ino=4026531855, st_mode=S_IFDIR|0555, st_nlink=1, st_uid=65534, st_gid=65534, st_blksize=1024, st_blocks=0, st_size=0, st_atime=2016/10/14-03:13:39.1630
327 openat(AT_FDCWD, "/proc", O_RDONLY|O_DIRECTORY|O_CLOEXEC|O_PATH) = 11</proc>
327 name_to_handle_at(11</proc>, "sys", {handle_bytes=128}, 0x7ffe3a238604, AT_SYMLINK_FOLLOW) = -1 EOPNOTSUPP (Operation not supported)
327 name_to_handle_at(11</proc>, "", {handle_bytes=128}, 0x7ffe3a238608, AT_EMPTY_PATH) = -1 EOPNOTSUPP (Operation not supported)
327 openat(11</proc>, "sys", O_RDONLY|O_CLOEXEC|O_PATH) = 13</proc/sys>
327 open("/proc/self/fdinfo/13", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)
327 close(13</proc/sys> <unfinished ...>
327 <... close resumed> ) = 0
327 close(11</proc> <unfinished ...>
327 <... close resumed> ) = 0
-bash-4.3# ls -ld /proc/
dr-xr-xr-x 76 65534 65534 0 Oct 14 02:57 /proc/
-bash-4.3# ls -ld /proc/1
dr-xr-xr-x 9 root root 0 Oct 14 02:57 /proc/1
-bash-4.3# ls -ld /proc/1/fdinfo
dr-x------ 2 65534 65534 0 Oct 14 03:00 /proc/1/fdinfo
|
|
|
|
This is minor but lets try to split and move bit by bit cgroups and
portable environment setup before applying the security context.
|
|
|
|
|
|
This fixes: https://github.com/systemd/systemd/issues/4357
Let's lookup and cache creds then apply them. We also switch from
getgroups() to getgrouplist().
|
|
rename failure-action to emergency-action and use it for ctrl+alt+del burst
|
|
This stripping is contolled by a new boolean parameter. When the parameter
is true, it means that the caller does not care about the distinction between
initrd and real root, and wants to act on both rd-dot-prefixed and unprefixed
parameters in the initramfs, and only on the unprefixed parameters in real
root. If the parameter is false, behaviour is the same as before.
Changes by caller:
log.c (systemd.log_*): changed to accept rd-dot-prefix params
pid1: no change, custom logic
cryptsetup-generator: no change, still accepts rd-dot-prefix params
debug-generator: no change, does not accept rd-dot-prefix params
fsck: changed to accept rd-dot-prefix params
fstab-generator: no change, custom logic
gpt-auto-generator: no change, custom logic
hibernate-resume-generator: no change, does not accept rd-dot-prefix params
journald: changed to accept rd-dot-prefix params
modules-load: no change, still accepts rd-dot-prefix params
quote-check: no change, does not accept rd-dot-prefix params
udevd: no change, still accepts rd-dot-prefix params
I added support for "rd." params in the three cases where I think it's
useful: logging, fsck options, journald forwarding options.
|
|
- do not crash if an option without value is specified on the kernel command
line, e.g. "udev.log-priority" :P
- simplify the code a bit
- warn about unknown "udev.*" options — this should make it easier to spot
typos and reduce user confusion
|
|
This makes journald use the common option parsing functionality.
One behavioural change is implemented:
"systemd.journald.forward_to_syslog" is now equivalent to
"systemd.journald.forward_to_syslog=1".
I think it's nicer to use this way.
|
|
No functional change.
|
|
catalog: add more Korean translations
|
|
Add more Korean translations of journal and DNSSEC log messages.
|
|
Fix typo: s/ournald.conf/journald.conf/
Change also "시스템의 다음 위치에" to "시스템을 별도 위치에" to make
a clear sentence.
|
|
|
|
NEWS: fix typos
|
|
|
|
The log forward levels can be configured through kernel command line.
|
|
|
|
core: if the start command vanishes during runtime don't hit an assert
|
|
Fixes #4306
|
|
|