Age | Commit message (Collapse) | Author |
|
Let's use chase_symlinks() everywhere, and stop using GNU
canonicalize_file_name() everywhere. For most cases this should not change
behaviour, however increase exposure of our function to get better tested. Most
importantly in a few cases (most notably nspawn) it can take the correct root
directory into account when chasing symlinks.
|
|
We don't have plural in the name of any other -util files and this
inconsistency trips me up every time I try to type this file name
from memory. "formats-util" is even hard to pronounce.
|
|
When a docker container is confined with AppArmor [1] and happens to run
on top of a kernel that supports mount mediation [2], e.g. any Ubuntu
kernel, mount(2) returns EACCES instead of EPERM. This then leads to:
systemd-logind[33]: Failed to mount per-user tmpfs directory /run/user/1000: Permission denied
login[42]: pam_systemd(login:session): Failed to create session: Access denied
and user sessions don't start.
This also applies to selinux that too returns EACCES on mount denial.
[1] https://github.com/docker/docker/blob/master/docs/security/apparmor.md#understand-the-policies
[2] http://bazaar.launchpad.net/~apparmor-dev/apparmor/master/view/head:/kernel-patches/4.7/0025-UBUNTU-SAUCE-apparmor-Add-the-ability-to-mediate-mou.patch
|
|
The parsing functions for [User]TasksMax were inconsistent. Empty string and
"infinity" were interpreted as no limit for TasksMax but not accepted for
UserTasksMax. Update them so that they're consistent with other knobs.
* Empty string indicates the default value.
* "infinity" indicates no limit.
While at it, replace opencoded (uint64_t) -1 with CGROUP_LIMIT_MAX in TasksMax
handling.
v2: Update empty string to indicate the default value as suggested by Zbigniew
Jędrzejewski-Szmek.
v3: Fixed empty UserTasksMax handling.
|
|
This adds the boolean RemoveIPC= setting to service, socket, mount and swap
units (i.e. all unit types that may invoke processes). if turned on, and the
unit's user/group is not root, all IPC objects of the user/group are removed
when the service is shut down. The life-cycle of the IPC objects is hence bound
to the unit life-cycle.
This is particularly relevant for units with dynamic users, as it is essential
that no objects owned by the dynamic users survive the service exiting. In
fact, this patch adds code to imply RemoveIPC= if DynamicUser= is set.
In order to communicate the UID/GID of an executed process back to PID 1 this
adds a new "user lookup" socket pair, that is inherited into the forked
processes, and closed before the exec(). This is needed since we cannot do NSS
from PID 1 due to deadlock risks, However need to know the used UID/GID in
order to clean up IPC owned by it if the unit shuts down.
|
|
config_parse_user_tasks_max() was incorrectly accepting percentage value
between 1 and 99. Update it to accept 0% and 100%. This brings it in line
with TasksMax handling in systemd.
|
|
|
|
Let's change from a fixed value of 12288 tasks per user to a relative value of
33%, which with the kernel's default of 32768 translates to 10813. This is a
slight decrease of the limit, for no other reason than "33%" sounding like a nice
round number that is close enough to 12288 (which would translate to 37.5%).
(Well, it also has the nice effect of still leaving a bit of room in the PID
space if there are 3 cooperating evil users that try to consume all PIDs...
Also, I like my bikesheds blue).
Since the new value is taken relative, and machined's TasksMax= setting
defaults to 16384, 33% inside of containers is usually equivalent to 5406,
which should still be ample space.
To summarize:
| on the host | in the container
old default | 12288 | 12288
new default | 10813 | 5406
|
|
The various bits of code did the scaling all different, let's unify this,
given that the code is not trivial.
|
|
And port a couple of users over to it.
|
|
The deserialize_timestamp_value() is renamed timestamp_deserialize() to be more
consistent with dual_timestamp_deserialize()
And add the NULL check back on realtime and monotonic
|
|
which is introduced in the ebf30a086d commit.
|
|
This should be handled fine now by .dir-locals.el, so need to carry that
stuff in every file.
|
|
manager_{start,stop}_{slice,scope,unit} functions had an optional job
output parameter. But all callers specified job, so make the parameter
mandatory, add asserts. Also extract common job variable handling to
a helper function to avoid duplication.
Avoids gcc warning about job being unitialized.
|
|
Compare errno with zero in a way that tells gcc that
(if the condition is true) errno is positive.
|
|
GLIB has recently started to officially support the gcc cleanup
attribute in its public API, hence let's do the same for our APIs.
With this patch we'll define an xyz_unrefp() call for each public
xyz_unref() call, to make it easy to use inside a
__attribute__((cleanup())) expression. Then, all code is ported over to
make use of this.
The new calls are also documented in the man pages, with examples how to
use them (well, I only added docs where the _unref() call itself already
had docs, and the examples, only cover sd_bus_unrefp() and
sd_event_unrefp()).
This also renames sd_lldp_free() to sd_lldp_unref(), since that's how we
tend to call our destructors these days.
Note that this defines no public macro that wraps gcc's attribute and
makes it easier to use. While I think it's our duty in the library to
make our stuff easy to use, I figure it's not our duty to make gcc's own
features easy to use on its own. Most likely, client code which wants to
make use of this should define its own:
#define _cleanup_(function) __attribute__((cleanup(function)))
Or similar, to make the gcc feature easier to use.
Making this logic public has the benefit that we can remove three header
files whose only purpose was to define these functions internally.
See #2008.
|
|
If the last reference to a user is released, we queue stop-jobs for the
user-service and slice. Only once those are finished, we drop the
user-object. However, if a new session is opened before the user object is
fully dropped, we currently incorrectly re-use the object. This has the
effect, that we get stale sessions without a valid "systemd --user"
instance.
Fix this by properly allowing user_start() to be called, even if
user->stopping is true.
|
|
Just like user->slice, there is no reason to store the unit name in /run,
nor should we allocate it dynamically on job instantiation/removal. Just
keep it statically around at all times and rely on user->started ||
user->stopping to figure out whether the unit exists or not.
|
|
Few changes to user_new() and user_free():
- Use _cleanup_(user_freep) in constructor
- return 'int' from user_new()
- make user_free() deal with partially initialized objects
- keep reverse-order in user_free() compared to user_new()
- make user_free() return NULL
- make user_free() accept NULL as no-op
|
|
Currently, we allocate user->slice when starting a slice, but we never
release it. This is incompatible if we want to re-use a user object once
it was stopped. Hence, make sure user->slice is allocated statically on
the user object and use "u->started || u->stopping" as an indication
whether the slice is actually available on pid1 or not.
|
|
Lets not pretend we support changing XDG_RUNTIME_DIR via logind state
files. There is no reason to ever write the string into /run, as we
allocate it statically based on the UID, anyway. Lets stop that and just
allocate the runtime_path in "struct User" at all times.
We keep writing it into the /run state to make sure pam_systemd of
previous installs can still read it. However, pam_systemd is now fixed to
allocate it statically as well, so we can safely remove that some time in
the future.
Last but not least: If software depends on systemd, they're more than free
to assume /run/user/$uid is their runtime dir. Lets not require sane
applications to query the environment to get their runtime dir. As long as
applications know their login-UID, they should be safe to deduce the
runtime dir.
|
|
This new setting configures the TasksMax= field for the slice objects we
create for each user.
This alters logind to create the slice unit as transient unit explicitly
instead of relying on implicit generation of slice units by simply
starting them. This also enables us to set a friendly description for
slice units that way.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
There are more than enough to deserve their own .c file, hence move them
over.
|
|
This really deserves its own file, given how much code this is now.
|
|
- Rely everywhere that we use abs() on the error code passed in anyway,
thus don't need to explicitly negate what we pass in
- Never attach synthetic error number information to log messages. Only
log about errors we *receive* with the error number we got there,
don't log any synthetic error, that don#t even propagate, but just eat
up.
- Be more careful with attaching exactly the error we get, instead of
errno or unrelated errors randomly.
- Fix one occasion where the error number and line number got swapped.
- Make sure we never tape over OOM issues, or inability to resolve
specifiers
|
|
off_t is a really weird type as it is usually 64bit these days (at least
in sane programs), but could theoretically be 32bit. We don't support
off_t as 32bit builds though, but still constantly deal with safely
converting from off_t to other types and back for no point.
Hence, never use the type anymore. Always use uint64_t instead. This has
various benefits, including that we can expose these values directly as
D-Bus properties, and also that the values parse the same in all cases.
|
|
This replaces this:
free(p);
p = NULL;
by this:
p = mfree(p);
Change generated using coccinelle. Semantic patch is added to the
sources.
|
|
Some places invoked fflush() directly with their own manual error
checking, let's unify all that by using fflush_and_check().
This also unifies the general error paths of fflush()+rename() file
writers.
|
|
logind: save /run/systemd/users/UID before starting user@.service
|
|
Previously, this had a race condition during a user's first login.
Some component calls CreateSession (most likely by a PAM service
other than 'systemd-user' running pam_systemd), with the following
results:
- logind:
* create the user's XDG_RUNTIME_DIR
* tell pid 1 to create user-UID.slice
* tell pid 1 to start user@UID.service
Then these two processes race:
- logind:
* save information including XDG_RUNTIME_DIR to /run/systemd/users/UID
- the subprocess of pid 1 responsible for user@service:
* start a 'systemd-user' PAM session, which reads XDG_RUNTIME_DIR
and puts it in the environment
* run systemd --user, which requires XDG_RUNTIME_DIR in the
environment
If logind wins the race, which usually happens, everything is fine;
but if the subprocesses of pid 1 win the race, which can happen
under load, then systemd --user exits unsuccessfully.
To avoid this race, we have to write out /run/systemd/users/UID
even though the service has not "officially" started yet;
previously this did an early-return without saving anything.
Record its state as OPENING in this case.
Bug: https://github.com/systemd/systemd/issues/232
Reviewed-by: Philip Withnall <philip.withnall@collabora.co.uk>
|
|
As discussed in #257: we should ensure the selinux label is correctly
applied to each user's XDG_RUNTIME_DIR.
|
|
Let's use it as initializer where appropriate.
|
|
Fix CID 1304686: Dereference after null check (FORWARD_NULL)
However, this commit does not fix any bug in logind. It helps to keep
the elect_display_compare() function generic.
|
|
|
|
The previous implementation of user_elect_display() could easily end up
overwriting the user’s valid graphical session with a new TTY session.
For example, consider the situation where there is one session:
c1, type = SESSION_X11, !stopping, class = SESSION_USER
it is initially elected as the user’s display (i.e. u->display = c1).
If another session is started, on a different VT, the sessions_by_user
list becomes:
c1, type = SESSION_X11, !stopping, class = SESSION_USER
c2, type = SESSION_TTY, !stopping, class = SESSION_USER
In the previous code, graphical = c1 and text = c2, as expected.
However, neither graphical nor text fulfil the conditions for setting
u->display = graphical (because neither is better than u->display), so
the code falls through to check the text variable. The conditions for
this match, as u->display->type != SESSION_TTY (it’s actually
SESSION_X11). Hence u->display is set to c2, which is incorrect, because
session c1 is still valid.
Refactor user_elect_display() to use a more explicit filter and
pre-order comparison over the sessions. This can be demonstrated to be
stable and only ever ‘upgrade’ the session to a more graphical one.
https://bugs.freedesktop.org/show_bug.cgi?id=90769
|
|
This makes path_is_mount_point() consistent with fd_is_mount_point() wrt.
flags.
|
|
A variety of changes:
- Make sure all our calls distuingish OOM from other errors if OOM is
not the only error possible.
- Be much stricter when parsing escaped paths, do not accept trailing or
leading escaped slashes.
- Change unit validation to take a bit mask for allowing plain names,
instance names or template names or an combination thereof.
- Refuse manipulating invalid unit name
|
|
|
|
- Move to its own file rm-rf.c
- Change parameters into a single flags parameter
- Remove "honour sticky" logic, it's unused these days
|
|
This patch removes includes that are not used. The removals were found with
include-what-you-use which checks if any of the symbols from a header is
in use.
|
|
|
|
After all it is now much more like strjoin() than strappend(). At the
same time, add support for NULL sentinels, even if they are normally not
necessary.
|
|
In containers without CAP_SYS_ADMIN, it is not possible to mount tmpfs
(or any filesystem for that matter) on top of /run/user/$UID.
Previously, logind just failed in such a situation.
Now, logind will resort to chown+chmod of the directory instead. This
allows logind still to work in those environments, although without the
guarantees it provides (i.e. users not being able to DOS /run or other
users' /run/user/$UID space) when CAP_SYS_ADMIN is available.
|
|
If setup of per-user runtime dir fails, clean up afterwards by removing
the directory before returning from the function, so we don't leave the
directory behind.
If this is not done, the second time the user logs in logind would
assume that the directory is already set up, even though it isn't.
|