Age | Commit message (Collapse) | Author |
|
Reported by: James Lott <james@lottspot.com>
|
|
Use SETLINK when modifying an existing link.
|
|
|
|
|
|
container side get fixed MAC addresses
|
|
Since b5eca3a2059f9399d1dc52cbcf9698674c4b1cf0 we don't attempt to GC
busses anymore when unsent messages remain that keep their reference,
when they otherwise are not referenced anymore. This means that if we
explicitly want connections to go away, we need to close them.
With this change we will no do so explicitly wherver we connect to the
bus from a main program (and thus know when the bus connection should go
away), or when we create a private bus connection, that really should go
away after our use.
This fixes connection leaks in the NSS and PAM modules.
|
|
getopt is usually good at printing out a nice error message when
commandline options are invalid. It distinguishes between an unknown
option and a known option with a missing arg. It is better to let it
do its job and not use opterr=0 unless we actually want to suppress
messages. So remove opterr=0 in the few places where it wasn't really
useful.
When an error in options is encountered, we should not print a lengthy
help() and overwhelm the user, when we know precisely what is wrong
with the commandline. In addition, since help() prints to stdout, it
should not be used except when requested with -h or --help.
Also, simplify things here and there.
|
|
Based on patch by Michael Marineau <michael.marineau@coreos.com>:
When deriving the network interface name from machine name strncpy was
not properly null terminating the string and the maximum string size as
returned by strlen() is actually IFNAMSIZ-1, not IFNAMSIZ.
|
|
String which ended in an unfinished quote were accepted, potentially
with bad memory accesses.
Reject anything which ends in a unfished quote, or contains
non-whitespace characters right after the closing quote.
_FOREACH_WORD now returns the invalid character in *state. But this return
value is not checked anywhere yet.
Also, make 'word' and 'state' variables const pointers, and rename 'w'
to 'word' in various places. Things are easier to read if the same name
is used consistently.
mbiebl_> am I correct that something like this doesn't work
mbiebl_> ExecStart=/usr/bin/encfs --extpass='/bin/systemd-ask-passwd "Unlock EncFS"'
mbiebl_> systemd seems to strip of the quotes
mbiebl_> systemctl status shows
mbiebl_> ExecStart=/usr/bin/encfs --extpass='/bin/systemd-ask-password Unlock EncFS $RootDir $MountPoint
mbiebl_> which is pretty weird
|
|
Explicitly initalize descriptors using explicit assignment like
bus_error. This makes barriers follow the same conventions as
everything else and makes things a bit simpler too.
Rename barier_init to barier_create so it is obvious that it is
not about initialization.
Remove some parens, etc.
|
|
I dropped the cleanup-helper before pushing so use _cleanup_() directly.
|
|
The Barrier-API simplifies cross-fork() synchronization a lot. Replace the
hard-coded eventfd-util implementation and drop it.
Compared to the old API, Barriers also handle exit() of the remote side as
abortion. This way, segfaults will not cause the parent to deadlock.
EINTR handling is currently ignored for any barrier-waits. This can easily
be added, but it isn't needed so far so I dropped it. EINTR handling in
general is ugly, anyway. You need to deal with pselect/ppoll/... variants
and make sure not to unblock signals at the wrong times. So genrally,
there's little use in adding it.
|
|
|
|
(ephemeral) mode
Two modes are supported: --volatile=yes mounts only /usr into the
container, and a tmpfs as root directory. --volatile=state mounts the
full OS tree in, but overmounts /var with a tmpfs.
--volatile=yes hence boots with an unpopulated /etc and /var, starting
with pristine configuration and state.
--volatile=state hence boots with an unpopulated /var, only starting
with pristine state.
|
|
THis way we can remove cgroup priviliges after setup, but get them back
for the next restart, as we need it.
|
|
Let's protect ourselves against the recently reported docker security
issue. Our man page makes clear that we do not make any security
promises anyway, but well, this one is easy to mitigate, so let's do it.
While we are at it block a couple of more syscalls that are no good in
containers, too.
|
|
|
|
|
|
This is at the suggestion of Djalal Harouni on the mailing list, and
reflects the behavior of shared/util.c:wait_for_terminate_and_warn().
|
|
Commit 113cea8 introduced a bug that caused the exit code of systemd-nspawn
to not reflect the exit code of the program executed in the container.
|
|
This allows us to bootup a rootfs with a /usr directory only.
|
|
This allows us to bootup a rootfs with a /usr directory only.
|
|
|
|
The file should have been in /usr/lib/ in the first place, since it
describes the OS container in /usr (and not the configuration in /etc),
hence, let's support os-release files in /usr/lib as fallback if no
version in /etc exists, following the usual override logic.
A prior commit already enabled tmpfiles to create /etc/os-release as a
symlink to /usr/lib/os-release should it be missing, thus providing nice
compatibility with applications only checking in /etc.
While it's probably a good idea if all apps check both locations via a
fallback logic, it is only necessary in the early boot process, as long
as the /etc/os-release symlink has not been restored, in case we boot
with an empty /etc.
|
|
such as /var
|
|
|
|
For names like /var/lib/container/something, the message
becomes quite long. Better to split it.
Also reword the message not to suggest that ^]^]^] only works
in the beginning.
|
|
Instead of blindly creating another bind mount for read-only mounts,
check if there's already one we can use, and if so, use it. Also,
recursively mark all submounts read-only too. Also, ignore autofs mounts
when remounting read-only unless they are already triggered.
|
|
nspawn and the container child use eventfd to wait and notify each other
that they are ready so the container setup can be completed.
However in its current form the wait/notify event ignore errors that
may especially affect the child (container).
On errors the child will jump to the "child_fail" label and terminate
with _exit(EXIT_FAILURE) without notifying the parent. Since the eventfd
is created without the "EFD_NONBLOCK" flag, this leaves the parent
blocking on the eventfd_read() call. The container can also be killed
at any moment before execv() and the parent will not receive
notifications.
We can fix this by using cheap mechanisms, the new high level eventfd
API and handle SIGCHLD signals:
* Keep the cheap eventfd and EFD_NONBLOCK flag.
* Introduce eventfd states for parent and child to sync.
Child notifies parent with EVENTFD_CHILD_SUCCEEDED on success or
EVENTFD_CHILD_FAILED on failure and before _exit(). This prevents the
parent from waiting on an event that will never come.
* If the child is killed before execv() or before notifying the parent,
we install a NOP handler for SIGCHLD which will interrupt blocking calls
with EINTR. This gives a chance to the parent to call wait() and
terminate in main().
* If there are no errors, parent will block SIGCHLD, restore default
handler and notify child which will do execv(), then parent will pass
control to process_pty() to do its magic.
This was exposed in part by:
https://bugs.freedesktop.org/show_bug.cgi?id=76193
Reported-by: Tobias Hunger tobias.hunger@gmail.com
|
|
Move the container wait logic into its own wait_for_container() function
and add two status codes: CONTAINER_TERMINATED or CONTAINER_REBOOTED.
The status will be stored in its argument, this way we handle:
a) Return negative on failures.
b) Return zero on success and set the status to either
CONTAINER_REBOOTED or CONTAINER_TERMINATED.
These status codes are used to terminate nspawn or loop again in case of
CONTAINER_REBOOTED.
|
|
|
|
This undoes part of commit e6a4a517befe559adf6d1dbbadf425c3538849c9.
Instead of removing the error message about non-empty journal bind mount
directories, simply downgrade the message to a warning and proceed.
|
|
dentry
Currently if nspawn was called with --link-journal=host or
--link-journal=auto and the right /var/log/journal/machine-id/ exists
then the bind mount the subdirectory into the container might fail due
to the ~/mycontainer/var/log/journal/machine-id/ of the container not
being empty.
There is no reason to check if the container journal subdir is empty
since there will be a bind mount on top of it. The user asked for a bind
mount so give it.
Note: a next call with --link-journal=guest may fail due to the
/var/log/journal/machine-id/ on the host not being empty.
https://bugs.freedesktop.org/show_bug.cgi?id=76193
Reported-by: Tobias Hunger <tobias.hunger@gmail.com>
|
|
|
|
http://lists.freedesktop.org/archives/systemd-devel/2014-April/018971.html
|
|
change_uid_gid() never initialises sz which may cause greedy_realloc to
skip the initial buffer allocation.
|
|
Use a static table with all the typing information, rather than repeated
switch statements. This should make it a lot simpler to add new types.
We need to keep all the type info to be able to create containers
without exposing their implementation details to the users of the library.
As a freebee we verify the types of appended/read attributes.
The API is extended to nicely deal with unions of container types.
|
|
safe_close_pair() is more like safe_close(), except that it handles
pairs of fds, and doesn't make and misleading allusion, as it works
similarly well for socketpairs() as for pipe()s...
|
|
safe_close() automatically becomes a NOP when a negative fd is passed,
and returns -1 unconditionally. This makes it easy to write lines like
this:
fd = safe_close(fd);
Which will close an fd if it is open, and reset the fd variable
correctly.
By making use of this new scheme we can drop a > 200 lines of code that
was required to test for non-negative fds or to reset the closed fd
variable afterwards.
|
|
|
|
|
|
With systemd 211 nspawn attempts to create the home directory for the
given uid. However, if the home directory already exists then it will
fail. Don't error out on -EEXIST.
|
|
We still need to make sure that no two MAC addresses are the same, so we use
a logic similar to what is used in udev to generate MAC addresses, and base
it on a hash of the host's machine ID and thecontainer's name.
|
|
discoverable partitions spec
|
|
|
|
|
|
|
|
/usr/bin/getent instead of in-process
When the container runs a different native architecture than the host we
shouldn't attempt to load the container's NSS modules with the host's
libc. Instead, resolve UID/GID by invoking /usr/bin/getent in the
container. The tool should be fairly universally available and allows us
to do resolving of the UID/GID with the container's libc in a parsable
format.
https://bugs.freedesktop.org/show_bug.cgi?id=75733
|
|
child after the parent added us to the device cgroup
|
|
We overmount /dev/console with an external pty anyway, hence there's no
point in using the real major/minor when we create the node to
overmount. Instead, use the one of /dev/null now.
This fixes a race against the cgroup device controller setup we are
using. In case /dev/console was create before the cgroup policy was
applied all was good, but if created in the opposite order the mknod()
would fail, since creating /dev/console is not allowed by it. Creating
/dev/null instances is however permitted, and hence use it.
|