summaryrefslogtreecommitdiff
path: root/src/shared/seccomp-util.c
AgeCommit message (Collapse)Author
2017-02-14Define clone order on ppc (#5325)Zbigniew Jędrzejewski-Szmek
This was tested on ppc64le. Assume the same is true for ppc64.
2017-02-12seccomp: disable RestrictAddressFamilies= for the ABI we shall block, not ↵Lennart Poettering
the one we are compiled for (#5272) It's a difference. Not a big one, but let's be correct here.
2017-02-10seccomp: order seccomp ABI list, so that our native ABI comes last (#5306)Lennart Poettering
this way, we can still call seccomp ourselves, even if seccomp() is blocked by the filter we are installing. Fixes: #5300
2017-02-09seccomp: add forgotten munmap() syscall to @file-system (#5291)Lennart Poettering
We added mmap() and mmap2(), but forgot munmap(). Fix that. Pointed out by @lucaswerkmeister: https://github.com/systemd/systemd/pull/4537#issuecomment-273275298
2017-02-08seccomp: on s390 the clone() parameters are reversedLennart Poettering
Add a bit of code that tries to get the right parameter order in place for some of the better known architectures, and skips restrict_namespaces for other archs. This also bypasses the test on archs where we don't know the right order. In this case I didn't bother with testing the case where no filter is applied, since that is hopefully just an issue for now, as there's nothing stopping us from supporting more archs, we just need to know which order is right. Fixes: #5241
2017-02-08seccomp: MemoryDenyWriteExecute= should affect both mmap() and mmap2() (#5254)Lennart Poettering
On i386 we block the old mmap() call entirely, since we cannot properly filter it. Thankfully it hasn't been used by glibc since quite some time. Fixes: #5240
2017-02-06seccomp: RestrictAddressFamilies= is not supported on i386/s390/s390x, make ↵Lennart Poettering
it a NOP See: #5215
2017-02-05seccomp: don't ever try to add an ABI before removing the default native ABI ↵Evgeny Vereshchagin
(#5230) https://github.com/systemd/systemd/issues/5215#issuecomment-277156262 libseccomp does not allow you to add architectures to a filter that doesn't match the byte ordering of the architectures already added to the filter (it would be a mess, not to mention largely pointless) and since systemd attempts to add an ABI before removing the default native ABI, you will always fail on Power (either due to ppc or ppc64le). The fix is to remove the native ABI before adding a new ABI so you don't run into problems with byte ordering. You would likely see the same failure on a MIPS system. Thanks @pcmoore!
2017-01-17seccomp: minor simplifications for is_seccomp_available()Lennart Poettering
2017-01-17seccomp: rework seccomp code, to improve compat with some archsLennart Poettering
This substantially reworks the seccomp code, to ensure better compatibility with some architectures, including i386. So far we relied on libseccomp's internal handling of the multiple syscall ABIs supported on Linux. This is problematic however, as it does not define clear semantics if an ABI is not able to support specific seccomp rules we install. This rework hence changes a couple of things: - We no longer use seccomp_rule_add(), but only seccomp_rule_add_exact(), and fail the installation of a filter if the architecture doesn't support it. - We no longer rely on adding multiple syscall architectures to a single filter, but instead install a separate filter for each syscall architecture supported. This way, we can install a strict filter for x86-64, while permitting a less strict filter for i386. - All high-level filter additions are now moved from execute.c to seccomp-util.c, so that we can test them independently of the service execution logic. - Tests have been added for all types of our seccomp filters. - SystemCallFilters= and SystemCallArchitectures= are now implemented in independent filters and installation logic, as they semantically are very much independent of each other. Fixes: #4575
2016-12-27seccomp: move bdflush() system call to @obsolete filter groupLennart Poettering
The system call is obsolete after all.
2016-12-27seccomp: add proper help string for @resources seccomp filter setLennart Poettering
2016-12-27seccomp: add two new filter sets: @reboot and @swapLennart Poettering
These groupe reboot()/kexec() and swapon()/swapoff() respectively
2016-11-21seccomp: add @filesystem syscall group (#4537)Lennart Poettering
@filesystem groups various file system operations, such as opening files and directories for read/write and stat()ing them, plus renaming, deleting, symlinking, hardlinking.
2016-11-04core: add new RestrictNamespaces= unit file settingLennart Poettering
This new setting permits restricting whether namespaces may be created and managed by processes started by a unit. It installs a seccomp filter blocking certain invocations of unshare(), clone() and setns(). RestrictNamespaces=no is the default, and does not restrict namespaces in any way. RestrictNamespaces=yes takes away the ability to create or manage any kind of namspace. "RestrictNamespaces=mnt ipc" restricts the creation of namespaces so that only mount and IPC namespaces may be created/managed, but no other kind of namespaces. This setting should be improve security quite a bit as in particular user namespacing was a major source of CVEs in the kernel in the past, and is accessible to unprivileged processes. With this setting the entire attack surface may be removed for system services that do not make use of namespaces.
2016-11-03seccomp-util, analyze: export comments as a help stringZbigniew Jędrzejewski-Szmek
Just to make the whole thing easier for users.
2016-11-03seccomp-util: move @default to the first positionZbigniew Jędrzejewski-Szmek
Now that the list is user-visible, @default should be first.
2016-11-02seccomp: add two new syscall groupsLennart Poettering
@resources contains various syscalls that alter resource limits and memory and scheduling parameters of processes. As such they are good candidates to block for most services. @basic-io contains a number of basic syscalls for I/O, similar to the list seccomp v1 permitted but slightly more complete. It should be useful for building basic whitelisting for minimal sandboxes
2016-11-02seccomp: include pipes and memfd in @ipcLennart Poettering
These system calls clearly fall in the @ipc category, hence should be listed there, simply to avoid confusion and surprise by the user.
2016-11-02seccomp: drop execve() from @process listLennart Poettering
The system call is already part in @default hence implicitly allowed anyway. Also, if it is actually blocked then systemd couldn't execute the service in question anymore, since the application of seccomp is immediately followed by it.
2016-11-02seccomp: add clock query and sleeping syscalls to "@default" groupLennart Poettering
Timing and sleep are so basic operations, it makes very little sense to ever block them, hence don't.
2016-11-01seccomp: allow specifying arm64, mips, ppc (#4491)Zbigniew Jędrzejewski-Szmek
"Secondary arch" table for mips is entirely speculative…
2016-10-24seccomp: add new helper call seccomp_load_filter_set()Lennart Poettering
This allows us to unify most of the code in apply_protect_kernel_modules() and apply_private_devices().
2016-10-24seccomp: two fixes for the syscall set tablesLennart Poettering
"oldumount()" is not a syscall, but simply a wrapper for it, the actual syscall nr is called "umount" (and the nr of umount() is called umount2 internally). "sysctl()" is not a syscall, but "_syscall()" is. Fix this in the table. Without these changes libseccomp cannot actually translate the tables in full. This wasn't noticed before as the code was written defensively for this case.
2016-10-24seccomp: add new seccomp_init_conservative() helperLennart Poettering
This adds a new seccomp_init_conservative() helper call that is mostly just a wrapper around seccomp_init(), but turns off NNP and adds in all secondary archs, for best compatibility with everything else. Pretty much all of our code used the very same constructs for these three steps, hence unifying this in one small function makes things a lot shorter. This also changes incorrect usage of the "scmp_filter_ctx" type at various places. libseccomp defines it as typedef to "void*", i.e. it is a pointer type (pretty poor choice already!) that casts implicitly to and from all other pointer types (even poorer choice: you defined a confusing type now, and don't even gain any bit of type safety through it...). A lot of the code assumed the type would refer to a structure, and hence aded additional "*" here and there. Remove that.
2016-10-24core: rework syscall filter set handlingLennart Poettering
A variety of fixes: - rename the SystemCallFilterSet structure to SyscallFilterSet. So far the main instance of it (the syscall_filter_sets[] array) used to abbreviate "SystemCall" as "Syscall". Let's stick to one of the two syntaxes, and not mix and match too wildly. Let's pick the shorter name in this case, as it is sufficiently well established to not confuse hackers reading this. - Export explicit indexes into the syscall_filter_sets[] array via an enum. This way, code that wants to make use of a specific filter set, can index it directly via the enum, instead of having to search for it. This makes apply_private_devices() in particular a lot simpler. - Provide two new helper calls in seccomp-util.c: syscall_filter_set_find() to find a set by its name, seccomp_add_syscall_filter_set() to add a set to a seccomp object. - Update SystemCallFilter= parser to use extract_first_word(). Let's work on deprecating FOREACH_WORD_QUOTED(). - Simplify apply_private_devices() using this functionality
2016-10-05seccomp: add support for the s390 architecture (#4287)hbrueckner
Add seccomp support for the s390 architecture (31-bit and 64-bit) to systemd. This requires libseccomp >= 2.3.1.
2016-09-06seccomp: also detect if seccomp filtering is enabledFelipe Sateler
In https://github.com/systemd/systemd/pull/4004 , a runtime detection method for seccomp was added. However, it does not detect the case where CONFIG_SECCOMP=y but CONFIG_SECCOMP_FILTER=n. This is possible if the architecture does not support filtering yet. Add a check for that case too. While at it, change get_proc_field usage to use PR_GET_SECCOMP prctl, as that should save a few system calls and (unnecessary) allocations. Previously, reading of /proc/self/stat was done as recommended by prctl(2) as safer. However, given that we need to do the prctl call anyway, lets skip opening, reading and parsing the file. Code for checking inspired by https://outflux.net/teach-seccomp/autodetect.html
2016-08-26Merge pull request #3984 from poettering/refcntEvgeny Vereshchagin
permit bus clients to pin units to avoid automatic GC
2016-08-22core: do not fail at step SECCOMP if there is no kernel support (#4004)Felipe Sateler
Fixes #3882
2016-08-22seccomp: make sure getrlimit() is among the default permitted syscallsLennart Poettering
A lot of basic code wants to know the stack size, and it is safe if they do, hence let's permit getrlimit() (but not setrlimit()) by default. See: #3970
2016-06-13core: improve seccomp syscall grouping a bitLennart Poettering
This adds three new seccomp syscall groups: @keyring for kernel keyring access, @cpu-emulation for CPU emulation features, for exampe vm86() for dosemu and suchlike, and @debug for ptrace() and related calls. Also, the @clock group is updated with more syscalls that alter the system clock. capset() is added to @privileged, and pciconfig_iobase() is added to @raw-io. Finally, @obsolete is a cleaned up. A number of syscalls that never existed on Linux and have no number assigned on any architecture are removed, as they only exist in the man pages and other operating sytems, but not in code at all. create_module() is moved from @module to @obsolete, as it is an obsolete system call. mem_getpolicy() is removed from the @obsolete list, as it is not obsolete, but simply a NUMA API.
2016-06-01core: add pre-defined syscall groups to SystemCallFilter= (#3053) (#3157)Topi Miettinen
Implement sets of system calls to help constructing system call filters. A set starts with '@' to distinguish from a system call. Closes: #3053, #3157
2016-02-10tree-wide: remove Emacs lines from all filesDaniel Mack
This should be handled fine now by .dir-locals.el, so need to carry that stuff in every file.
2015-12-06shared: include what we useThomas Hindoe Paaboel Andersen
The next step of a general cleanup of our includes. This one mostly adds missing includes but there are a few removals as well.
2015-11-16tree-wide: sort includesThomas Hindoe Paaboel Andersen
Sort the includes accoding to the new coding style.
2015-10-24util-lib: split our string related calls from util.[ch] into its own file ↵Lennart Poettering
string-util.[ch] There are more than enough calls doing string manipulations to deserve its own files, hence do something about it. This patch also sorts the #include blocks of all files that needed to be updated, according to the sorting suggestions from CODING_STYLE. Since pretty much every file needs our string manipulation functions this effectively means that most files have sorted #include blocks now. Also touches a few unrelated include files.
2014-02-18seccomp: add helper call to add all secondary archs to a seccomp filterLennart Poettering
And make use of it where appropriate for executing services and for nspawn.
2014-02-13core: add SystemCallArchitectures= unit setting to allow disabling of non-nativeLennart Poettering
architecture support for system calls Also, turn system call filter bus properties into complex types instead of concatenated strings.