Age | Commit message (Collapse) | Author |
|
It's not safe invoking NSS from PID 1, hence fork off worker processes
that upload the policy into the kernel for busnames.
|
|
also mounting /etc read-only
Also, rename ProtectedHome= to ProtectHome=, to simplify things a bit.
With this in place we now have two neat options ProtectSystem= and
ProtectHome= for protecting the OS itself (and optionally its
configuration), and for protecting the user's data.
|
|
With Symlinks= we can manage one or more symlinks to AF_UNIX or FIFO
nodes in the file system, with the same lifecycle as the socket itself.
This has two benefits: first, this allows us to remove /dev/log and
/dev/initctl from /dev, thus leaving only symlinks, device nodes and
directories in the /dev tree. More importantly however, this allows us
to move /dev/log out of /dev, while still making it accessible there, so
that PrivateDevices= can provide /dev/log too.
|
|
ReadOnlySystem= uses fs namespaces to mount /usr and /boot read-only for
a service.
ProtectedHome= uses fs namespaces to mount /home and /run/user
inaccessible or read-only for a service.
This patch also enables these settings for all our long-running services.
Together they should be good building block for a minimal service
sandbox, removing the ability for services to modify the operating
system or access the user's private data.
|
|
Only accept cpu quota values in percentages, get rid of period
definition.
It's not clear whether the CFS period controllable per-cgroup even has a
future in the kernel, hence let's simplify all this, hardcode the period
to 100ms and only accept percentage based quota values.
|
|
Introduce a (unsigned long) -1 as "unset" state for cpu shares/block io
weights, and keep the startup unit set around all the time.
|
|
Similar to CPUShares= and BlockIOWeight= respectively. However only
assign the specified weight during startup. Each control group
attribute is re-assigned as weight by CPUShares=weight and
BlockIOWeight=weight after startup. If not CPUShares= or
BlockIOWeight= be specified, then the attribute is re-assigned to each
default attribute value. (default cpu.shares=1024, blkio.weight=1000)
If only CPUShares=weight or BlockIOWeight=weight be specified, then
that implies StartupCPUShares=weight and StartupBlockIOWeight=weight.
|
|
|
|
|
|
|
|
It's used for the FailureAction property as well.
|
|
tcpwrap is legacy code, that is barely maintained upstream. It's APIs
are awful, and the feature set it exposes (such as DNS and IDENT
access control) questionnable. We should not support this natively in
systemd.
Hence, let's remove the code. If people want to continue making use of
this, they can do so by plugging in "tcpd" for the processes they start.
With that scheme things are as well or badly supported as they were from
traditional inetd, hence no functionality is really lost.
|
|
|
|
Keep mounts done by udev rules private to udevd. Also, document how
MountFlags= may be used for this.
|
|
safe_close() automatically becomes a NOP when a negative fd is passed,
and returns -1 unconditionally. This makes it easy to write lines like
this:
fd = safe_close(fd);
Which will close an fd if it is open, and reset the fd variable
correctly.
By making use of this new scheme we can drop a > 200 lines of code that
was required to test for non-negative fds or to reset the closed fd
variable afterwards.
|
|
There are three directives to specify bus name polices in .busname
files:
* AllowUser [username] [access]
* AllowGroup [groupname] [access]
* AllowWorld [access]
Where [access] is one of
* 'see': The user/group/world is allowed to see a name on the bus
* 'talk': The user/group/world is allowed to talk to a name
* 'own': The user/group/world is allowed to own a name
There is no user added yet in this commit.
|
|
already explicitly set
|
|
Inexplicably, 550a40ec ('core: do not print invalid utf-8 in error
messages') only fixed two paths. Convert all of them now.
|
|
load-fragment.c
The parse code actually checked for specific lvalue names, which is
really wrong for supposedly generic parsers...
|
|
Let's keep specific config parsers close to where they are needed. Only
the really generic ones should be defined in conf-parser.[ch].
|
|
"level" is a bit too generic, let's clarify what kind of level we are
referring to here.
|
|
As discussed on the ML these are useful to manage runtime directories
below /run for services.
|
|
This new unit settings allows restricting which address families are
available to processes. This is an effective way to minimize the attack
surface of services, by turning off entire network stacks for them.
This is based on seccomp, and does not work on x86-32, since seccomp
cannot filter socketcall() syscalls on that platform.
|
|
for sizes
According to Wikipedia it is customary to specify hardware metrics and
transfer speeds to the basis 1000 (SI decimal), while software metrics
and physical volatile memory (RAM) sizes to the basis 1024 (IEC binary).
So far we specified everything in IEC, let's fix that and be more
true to what's otherwise customary. Since we don't want to parse "Mi"
instead of "M" we document each time what the context used is.
|
|
particular devices nodes
|
|
This permit to switch to a specific apparmor profile when starting a daemon. This
will result in a non operation if apparmor is disabled.
It also add a new build requirement on libapparmor for using this feature.
|
|
processes
|
|
|
|
|
|
This is useful to prohibit execution of non-native processes on systems,
for example 32bit binaries on 64bit systems, this lowering the attack
service on incorrect syscall and ioctl 32→64bit mappings.
|
|
architecture support for system calls
Also, turn system call filter bus properties into complex types instead
of concatenated strings.
|
|
- Allow configuration of an errno error to return from blacklisted
syscalls, instead of immediately terminating a process.
- Fix parsing logic when libseccomp support is turned off
- Only keep the actual syscall set in the ExecContext, and generate the
string version only on demand.
|
|
|
|
Suggested-by: Russ Allbery <rra@debian.org>
|
|
Previously we'd open the connection in the originating namespace, which
meant most peers of the bus would not be able to make sense of the
PID/UID/... identity of us since we didn't exist in the namespace they
run in. However they require this identity for privilege decisions,
hence disallowing access to anything from the host.
Instead, when connecting to a container, create a temporary subprocess,
make it join the container's namespace and then connect from there to
the kdbus instance. This is similar to how we do it for socket
conections already.
THis also unifies the namespacing code used by machinectl and the bus
APIs.
|
|
The only problem is that libgen.h #defines basename to point to it's
own broken implementation instead of the GNU one. This can be fixed
by #undefining basename.
|
|
|
|
PrivateTmp= namespaces
|
|
setting and make use of it where applicable
|
|
Pass on the line on which a section was decleared to the parsers, so they
can distinguish between multiple sections (if they chose to). Currently
no parsers take advantage of this, but a follow-up patch will do that
to distinguish
[Address]
Address=192.168.0.1/24
Label=one
[Address]
Address=192.168.0.2/24
Label=two
from
[Address]
Address=192.168.0.1/24
Label=one
Address=192.168.0.2/24
Label=two
|
|
|
|
This patch converts PID 1 to libsystemd-bus and thus drops the
dependency on libdbus. The only remaining code using libdbus is a test
case that validates our bus marshalling against libdbus' marshalling,
and this dependency can be turned off.
This patch also adds a couple of things to libsystem-bus, that are
necessary to make the port work:
- Synthesizing of "Disconnected" messages when bus connections are
severed.
- Support for attaching multiple vtables for the same interface on the
same path.
This patch also fixes the SetDefaultTarget() and GetDefaultTarget() bus
calls which used an inappropriate signature.
As a side effect we will now generate PropertiesChanged messages which
carry property contents, rather than just invalidation information.
|
|
|
|
|
|
We now treat passno as boleans in the generators, and don't need this any more. fsck itself
is able to sequentialize checks on the same local media, so in the common case the ordering
is redundant.
It is still possible to force an order by using .d fragments, in case that is desired.
|
|
each invocation
We can determine the list entry type via the typeof() gcc construct, and
so we should to make the macros much shorter to use.
|
|
The code was actually safe, because b should
never be null, because if rvalue is empty, a different
branch is taken. But we *do* check for NULL in the
loop above, so it's better to also check here for symmetry.
|
|
|
|
Previously to automatically create dependencies between mount units we
matched every mount unit agains all others resulting in O(n^2)
complexity. On setups with large amounts of mount units this might make
things slow.
This change replaces the matching code to use a hashtable that is keyed
by a path prefix, and points to a set of units that require that path to
be around. When a new mount unit is installed it is hence sufficient to
simply look up this set of units via its own file system paths to know
which units to order after itself.
This patch also changes all unit types to only create automatic mount
dependencies via the RequiresMountsFor= logic, and this is exposed to
the outside to make things more transparent.
With this change we still have some O(n) complexities in place when
handling mounts, but that's currently unavoidable due to kernel APIs,
and still substantially better than O(n^2) as before.
https://bugs.freedesktop.org/show_bug.cgi?id=69740
|
|
The cgroup attribute memory.soft_limit_in_bytes is unlikely to stay
around in the kernel for good, so let's not expose it for now. We can
readd something like it later when the kernel guys decided on a final
API for this.
|