Age | Commit message (Collapse) | Author |
|
This adds three kinds of file system locks for container images:
a) a file system lock next to the actual image, in a .lck file in the
same directory the image is located. This lock has the benefit of
usually being located on the same NFS share as the image itself, and
thus allows locking container images across NFS shares.
b) a file system lock in /run, named after st_dev and st_ino of the
root of the image. This lock has the advantage that it is unique even
if the same image is bind mounted to two different places at the same
time, as the ino/dev stays constant for them.
c) a file system lock that is only taken when a new disk image is about
to be created, that ensures that checking whether the name is already
used across the search path, and actually placing the image is not
interrupted by other code taking the name.
a + b are read-write locks. When a container is booted in read-only mode
a read lock is taken, otherwise a write lock.
Lock b is always taken after a, to avoid ABBA problems.
Lock c is mostly relevant when renaming or cloning images.
|
|
"read-only" concept for raw disk images, too
|
|
Unlike some client code suggests...
|
|
cunescape_length_with_prefix() is called with the length as an
argument, so it cannot rely on the buffer being NUL terminated.
Move the length check before accessing the memory.
When an incomplete escape sequence was given at the end of the
buffer, c_l_w_p() would read past the end of the buffer. Fix this
and add a test.
|
|
This fixes parsing of options in shared/generator.c. Existing code
had some issues:
- it would treate whitespace and semicolons as seperators. fstab(5)
is pretty clear that only commas matter. And the syntax does
not allow for spaces to be inserted in the field in fstab.
Whitespace might be escaped, but then it should not seperate
options. Treat whitespace and semicolons as any other character.
- it assumed that x-systemd.device-timeout would always be followed
by "=". But this is not guaranteed, hasmntopt will return this
option even if there's no value. Uninitialized memory could be read.
- some error paths would log, and inconsistently, some would just
return an error code.
Filtering is split out to a separate function and tests are added.
Similar code paths in other places are adjusted to use the new function.
|
|
Sometimes it is necessary to stop a generator from running. Either
because of a bug, or for testing, or some other reason. The only way
to do that would be to rename or chmod the generator binary, which is
inconvenient and does not survive upgrades. Allow masking and
overriding generators similarly to units and other configuration
files.
For the systemd instance, masking would be more common, rather than
overriding generators. For the user instances, it may also be useful
for users to have generators in $XDG_CONFIG_HOME to augment or
override system-wide generators.
Directories are searched according to the usual scheme (/usr/lib,
/usr/local/lib, /run, /etc), and files with the same name in higher
priority directories override files with the same name in lower
priority directories. Empty files and links to /dev/null mask a given
name.
https://bugs.freedesktop.org/show_bug.cgi?id=87230
|
|
Remove the optional sepearate opening of the directory,
it would be just too complicated with the change to
multiple directories.
Move the middle of execute_directory() to a seperate
function to make it easier to grok.
|
|
fd_setcrtime()
|
|
btrfs' COW logic results in heavily fragment journal files, which is
detrimental for perfomance. Hence, turn off COW for journal files as we
create them.
Turning off COW comes at the cost of data integrity guarantees, but this
should be acceptable, given that we do our own checksumming, and
generally have a pretty conservative write pattern.
Also see discussion on linux-btrfs:
http://www.spinics.net/lists/linux-btrfs/msg41001.html
|
|
connected terminal
So far, if we had no knowledge about the correct $TERM we defaulted to
v102, as a safe, conservative choice. However, the terminfo data for
vt102 is not aware of pageup/pagedown, which makes "less" much harder
work with than necessary. Setting vt220 allows them to work correctly.
"vt220" should be a sufficiently safe choice too, given that xterm,
gnome-terminal and the linux console all strive to implement vt220 as
baseline, already to pass pageup/pagedown correctly to apps.
Effectively, with this change "journalctl -e" run inside a
"systemd-nspawn" terminal will now run a pager where pageup/pagedown
works, which is quite an improvement of usability for containers.
|
|
|
|
from an obstructed mounted
|
|
With this change it is possible to send file descriptors to PID 1, via
sd_pid_notify_with_fds() which PID 1 will store individually for each
service, and pass via the usual fd passing logic on next invocation.
This is useful for enable daemon reload schemes where daemons serialize
their state to /run, push their fds into PID 1 and terminate, restoring
their state on next start from the data in /run and passed in from PID
1.
The fds are kept by PID 1 as long as no POLLHUP or POLLERR is seen on
them, and the service they belong to are either not dead or failed, or
have a job queued.
|
|
When setting up a namespace, mount flags like noexec, nosuid and
nodev are cleared, so the mounts always have exec, suid and dev
flags enabled.
Copy source directory mount flags to target mount when remounting
the bind mounts.
|
|
Regression introduced by ed757c0cb03eef50e8d9aeb4682401c3e9486f0b
Mirror the implementation of columns(), since the fd_columns()
functions returns a negative integer for errors.
Also fix columns() to return the unsigned variable instead of the
signed intermediary (they're the same, but better to be explicit).
|
|
|
|
machine images
|
|
|
|
subvolumes
We make use of the btrfs subvol crtime for this, and for gpt images of a
manually managed xattr, if we can.
|
|
|
|
There is alot of cleanup that will have to happen to turn on
-fstrict-aliasing, but I think our code should be "correct" to the rule.
|
|
on a pty and returns the pty master fd to the client
This is a one-stop solution for "machinectl login", and should simplify
getting logins in containers.
|
|
|
|
For that, ask machined for a container PTY and use that.
|
|
|
|
hidden_file() is a bit more precise, since dot files usually shouldn't
be ignored, but certainly be considered hidden.
|
|
extra "#" to the name
That way, we have a simple, somewhat reliable way to detect such
temporary files, by simply checking if they start with ".#".
|
|
deleting it)
|
|
Commit a2a5291b3f5 changed the parser to reject unfinished quoted
strings. Unfortunately it introduced an error where a trailing
backslash would case an infinite loop. Of course this must fixed, but
the question is what to to instead. Allowing trailing backslashes and
treating them as normal characters would be one option, but this seems
suboptimal. First, there would be inconsistency between handling of
quoting and of backslashes. Second, a trailing backslash is most
likely an error, at it seems better to point it out to the user than
to try to continue.
Updated rules:
ExecStart=/bin/echo \\ → OK, prints a backslash
ExecStart=/bin/echo \ → error
ExecStart=/bin/echo "x → error
ExecStart=/bin/echo "x"y → error
|
|
dup3() allows setting O_CLOEXEC which we are not interested in. However,
it also fails if called with the same fd as input and output, which is
something we don't want. Hence use dup2().
Also, we need to explicitly turn off O_CLOEXEC for the fds, in case the
input fd was O_CLOEXEC and < 3.
|
|
system of the OS
This works now:
# systemd-nspawn -xb -D / -M foobar
Which boots up an ephemeral container, based on the host's root file
system. Or in other words: you can now run the very same host OS you
booted your system with also in a container, on top of it, without
having it interfere. Great for testing whether the init system you are
hacking on still boots without reboot the system!
|
|
|
|
resulting name is actually valid
Also, rename filename_is_safe() to filename_is_valid(), since it
actually does a full validation for what the kernel will accept as file
name, it's not just a heuristic.
|
|
|
|
loop_write() didn't follow the usual systemd rules and returned status
partially in errno and required extensive checks from callers. Some of
the callers dealt with this properly, but many did not, treating
partial writes as successful. Simplify things by conforming to usual rules.
|
|
Let's add some syntactic sugar for iterating through inotify events, and
use it everywhere.
|
|
As kdbus no longer exports this, remove all traces from sd-bus too
|
|
environ is already defined in unistd.h
|
|
machine-id
If /etc was read only at boot time with an empty /etc/machine-id, the latter
will be mounted as a tmpfs and get reset at each boot. If the system becomes rw
later, this functionality enables to commit in a race-free manner the
transient machine-id to disk.
|
|
https://bugs.debian/org/771397
|
|
|
|
If the format string contains %m, clearly errno must have a meaningful
value, so we might as well use log_*_errno to have ERRNO= logged.
Using:
find . -name '*.[ch]' | xargs sed -r -i -e \
's/log_(debug|info|notice|warning|error|emergency)\((".*%m.*")/log_\1_errno(errno, \2/'
Plus some whitespace, linewrap, and indent adjustments.
|
|
Using:
find . -name '*.[ch]' | while read f; do perl -i.mmm -e \
'local $/;
local $_=<>;
s/(if\s*\([^\n]+\))\s*{\n(\s*)(log_[a-z_]*_errno\(\s*([->a-zA-Z_]+)\s*,[^;]+);\s*return\s+\g4;\s+}/\1\n\2return \3;/msg;
print;'
$f
done
And a couple of manual whitespace fixups.
|
|
It corrrectly handles both positive and negative errno values.
|
|
As a followup to 086891e5c1 "log: add an "error" parameter to all
low-level logging calls and intrdouce log_error_errno() as log calls
that take error numbers", use sed to convert the simple cases to use
the new macros:
find . -name '*.[ch]' | xargs sed -r -i -e \
's/log_(debug|info|notice|warning|error|emergency)\("(.*)%s"(.*), strerror\(-([a-zA-Z_]+)\)\);/log_\1_errno(-\4, "\2%m"\3);/'
Multi-line log_*() invocations are not covered.
And we also should add log_unit_*_errno().
|
|
|
|
On the contrary of env, the added function returns all characters
cescaped, because it improves reproducibility.
|
|
|
|
/proc/[pid]/cwd and /proc/[pid]/root are symliks to corresponding
directories
The added functions returns values of that symlinks.
|
|
|