summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2017-02-22Rename cg_is_unified_systemd_controller_wanted to cg_is_hybrid_wantedZbigniew Jędrzejewski-Szmek
Less typing and doesn't make the table so incredibly wide.
2017-02-20build.h: include default cgroup hierarchy setting in --version outputZbigniew Jędrzejewski-Szmek
This is pretty important, and we print this string during startup, so putting the default hierarchy information might help with diagnosis if things go awry. $ ./systemctl --version systemd 232 +PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN default-hierarchy=legacy v2: make the message nicer by including the ./configure option argument directly in output
2017-02-20pid1: add ./configure switch to select default cgroup hierarchyZbigniew Jędrzejewski-Szmek
The default default is set to "legacy", with "hybrid" and "unified" being the other two alternatives. There invert the behaviour for systemd.legacy_systemd_cgroup_controller: if it is not specified on the kernel command line, "hybrid" is used if selected as the default. If this option is specified, "hybrid" is used if false, and full "legacy" if true. Also make all fields in the configure summary lowercase (unless they are capitalized names) for consistency. v2: - update for the fixed interpreation of systemd.legacy_systemd_cgroup_controller
2017-02-20core: keep supporting cgroup hybrid layout from v232 for live upgradesTejun Heo
v232's cgroup hybrid mode mounted v2 on /sys/fs/cgroup/systemd, which unfortunately broke other tools which expect v1 there. From v233 on, hybrid mode instead mounts and uses v2 on /sys/fs/cgroup/unified and keeps /sys/fs/cgroup/systemd on v1 for compatibility with external tools. However, to keep systemd live upgrades working, v233+ should be able to recognize v232 layout and keep using it. This patch adds v232 hybrid mode support. If v232 layout is detected, cg_unified(SYSTEMD_CGRouP_CONTROLLER) keeps returning %true but cg_hybrid_unified() returns %false. This keeps process management on cgroup v2 but turns off the parallel layout.
2017-02-20core: make hybrid cgroup unified mode keep compat /sys/fs/cgroup/systemd ↵Tejun Heo
hierarchy Currently the hybrid mode mounts cgroup v2 on /sys/fs/cgroup instead of the v1 name=systemd hierarchy. While this works fine for systemd itself, it breaks tools which expect cgroup v1 hierarchy on /sys/fs/cgroup/systemd. This patch updates the hybrid mode so that it mounts v2 hierarchy on /sys/fs/cgroup/unified and keeps v1 "name=systemd" hierarchy on /sys/fs/cgroup/systemd for compatibility. systemd itself doesn't depend on the "name=systemd" hierarchy at all. All operations take place on the v2 hierarchy as before but the v1 hierarchy is kept in sync so that any tools which expect it to be there can keep doing so. This allows systemd to take advantage of cgroup v2 process management without requiring other tools to be aware of the hybrid mode. The hybrid mode is implemented by mapping the special systemd controller to /sys/fs/cgroup/unified and making the basic cgroup utility operations - cg_attach(), cg_create(), cg_rmdir() and cg_trim() - also operate on the /sys/fs/cgroup/systemd hierarchy whenever the cgroup2 hierarchy is updated. While a bit messy, this will allow dropping complications from using cgroup v1 for process management a lot sooner than otherwise possible which should make it a net gain in terms of maintainability. v2: Fixed !cgns breakage reported by @evverx and renamed the unified mount point to /sys/fs/cgroup/unified as suggested by @brauner. v3: chown the compat hierarchy too on delegation. Suggested by @evverx. v4: [zj] - drop the change to default, full "legacy" is still the default.
2017-02-20cgroup-util: fix the reversed return value of ↵Zbigniew Jędrzejewski-Szmek
cg_is_unified_systemd_contoller_wanted 1d84ad944520fc3e062ef518c4db4e1 reversed the meaning of the option. The kernel command line option has the opposite meaning to the function, i.e. specifying "legacy=yes" means "unifed systemd controller=no".
2017-02-18core: make SYSTEMD_CGROUP_CONTROLLER a special stringTejun Heo
SYSTEMD_CGROUP_CONTROLLER is currently defined as "name=systemd" which cgroup utility functions interpret as a named cgroup hierarchy with the specified named. With the planned cgroup hybrid mode changes, SYSTEMD_CGROUP_CONTROLLER would map to different hierarchy names. This patch makes SYSTEMD_CGROUP_CONTROLLER a special string "_systemd" which is substituted to "name=systemd" by the cgroup utility functions. This allows the callers to address the systemd hierarchy without actually specifying the hierarchy name allowing the cgroup utility functions to map it to whatever is appropriate. Note that SYSTEMD_CGROUP_CONTROLLER was already special on full unified cgroup hierarchy even before this patch.
2017-02-18core: simplify cg_[all_]unified()Tejun Heo
cg_[all_]unified() test whether a specific controller or all controllers are on the unified hierarchy. While what's being asked is a simple binary question, the callers must assume that the functions may fail any time, which unnecessarily complicates their usages. This complication is unnecessary. Internally, the test result is cached anyway and there are only a few places where the test actually needs to be performed. This patch simplifies cg_[all_]unified(). * cg_[all_]unified() are updated to return bool. If the result can't be decided, assertion failure is triggered. Error handlings from their callers are dropped. * cg_unified_flush() is updated to calculate the new result synchrnously and return whether it succeeded or not. Places which need to flush the test result are updated to test for failure. This ensures that all the following cg_[all_]unified() tests succeed. * Places which expected possible cg_[all_]unified() failures are updated to call and test cg_unified_flush() before calling cg_[all_]unified(). This includes functions used while setting up mounts during boot and manager_setup_cgroup().
2017-02-18nspawn: fix cgroup mode detectionTejun Heo
cgroup mode detection is broken in two different ways. * detect_unified_cgroup_hierarchy() is called too nested in outer_child(). sync_cgroup() which is used by run() also needs to know the requested cgroup mode but it's currently always getting CGROUP_UNIFIED_UNKNOWN. This makes it skip syncing the inner cgroup hierarchy on some config combinations. $ cat /proc/self/cgroup | grep systemd 1:name=systemd:/user.slice/user-0.slice/session-c1.scope $ UNIFIED_CGROUP_HIERARCHY=0 SYSTEMD_NSPAWN_USE_CGNS=0 systemd-nspawn -M container ... [root@container ~]# cat /proc/self/cgroup | grep systemd 1:name=systemd:/machine.slice/machine-container.x86_64.scope $ exit $ UNIFIED_CGROUP_HIERARCHY=1 SYSTEMD_NSPAWN_USE_CGNS=0 systemd-nspawn -M container [root@container ~]# cat /proc/self/cgroup | grep 0:: 0::/ $ exit Note how the unified hierarchy case's path is not synchronized with the host. This for example can cause issues when there are multiple such containers. Fixed by moving detect_unified_cgroup_hierarchy() invocation to main(). * inner_child() was invoking cg_unified_flush(). inner_child() executes fully scoped and can't determine which cgroup mode the host was in. It doesn't make sense to keep flushing the detected mode when the host mode can't change. Fixed by replacing cg_unified_flush() invocations in outer_child() and inner_child() with one in main().
2017-02-18journalctl: add reference to sd-id128(3) to output (#5382)Lucas Werkmeister
SD_ID128_MAKE is clearly not a standard C macro, so let’s point the user to its documentation to let them know which header they need and what they can then do with MESSAGE_XYZ.
2017-02-18Merge pull request #5369 from poettering/nspawn-resolvedZbigniew Jędrzejewski-Szmek
fixes for running nspawn+resolved in combination
2017-02-17nspawn: tweak check whether resolved is around a bitLennart Poettering
Let's check D-Bus instead of files in /run to see if resolved is running. This is a bit nicer as bus names are automatically cleaned up when resolved dies, which is not the case for files in /run. See: #4649
2017-02-17test: re-drop assumption that /run is a mount point (#5377)Martin Pitt
Commit 436e916ea introduced the assumption into test-stat-util that /run is a tmpfs mount point. This is not the case in build chroots such as Fedora's mock or Debian's sbuild. So only assert that /run is a tmpfs and not a btrfs if /run is actually a mount point. This will then still be asserted with installed tests.
2017-02-17systemctl: show extra args if defined (#5379)Adrián López
2017-02-17Merge pull request #5373 from poettering/coredump-timestamp-fixesZbigniew Jędrzejewski-Szmek
various coredump fixes
2017-02-17Merge pull request #5347 from poettering/local-ntaZbigniew Jędrzejewski-Szmek
more resolved fixes
2017-02-17missing: add renameat2() definition for 64bit arm (#5378)Lennart Poettering
Following a similar commit in casync: https://github.com/systemd/casync/pull/10
2017-02-17Merge pull request #5275 from ssahani/fix-dropin-net-sectionLennart Poettering
networkd: fix drop-in conf directory configs overwriting each other
2017-02-17udev: fix id_net_name_path for virtio-ccw interfaces (#5357)Viktor Mihajlovski
The CCW id_net_name_path detection didn't account for virtio interfaces on the CCW bus. As a result the default interface names for virtio-ccw interfaces would use the old eth<x> format instead of enc<busid>. Since virtio-pci interface naming follows the naming rules of the parent bus, the names_ccw() logic was changed to apply the CCW interface naming rules to virtio interfaces as well, e.g. enc2000 for an interface with a CCW bus id 0.0.2000. As virtio interfaces are apt to get the otherwise unusual CCW bus id 0.0.0000, the last '0' is now preserved in this case. The virtio subsystem skipping loop has been moved from names_pci() into a function skip_virtio() that can be reused for all bus types with virtio network devices. Since virtio-ccw interfaces use single CCW addresses the ccwgroup requirement was relaxed and the C definitions were changed accordingly.
2017-02-17network: change condition in if testing section presenceZbigniew Jędrzejewski-Szmek
section_line and filename should be set together or not at all. Change the if to test filename, since it's the first of the pair and it seems more natural to test that.
2017-02-17networkd: immediately transfer ownership of route->sectionZbigniew Jędrzejewski-Szmek
The code was not incorrect previously, but I think it's easier to follow the ownership (and the code is more likely to remain correct when updated later on), if freeing of NetworkConfigSection* is immediately made the responsibility of route_free(), so instead of relying on route_free() not freeing ->section if adding to the network hashmap failed, make this freeing unconditional.
2017-02-17Merge pull request #5333 from poettering/machined-copy-files-usernsLennart Poettering
machined userns fixes
2017-02-17coredump: store the full coredump kernel context in xattrs on the coredump fileLennart Poettering
We didn't include the resource limit field, add it.
2017-02-17coredump: when reconstructing original kernel coredump context, chop off ↵Lennart Poettering
trailing zeroes Our coredump handler operates on a "context" supplied by the kernel via the core_pattern arguments. When we pass off a coredump for processing to coredumpd we pass along enough information for this context to be reconstructed. This information is passed in the usual journal fields, and that means we extended the 1s granularity timestamp to 1µs granularity by appending 6 zeroes. We need to chop them off again when reconstructing the original kernel context. Fixes: #4779
2017-02-17udevd: use signal_to_string() instead of strsignal() at one placeLennart Poettering
strsignal() sucks, as it tries to generate human readable strings from something that isn't really human readable by concept. Let's use signal_to_string() instead, making this more grokkable. Difference is: SIGINT gets translated → "SIGINT" rather than → "Interrupted".
2017-02-17coredump: include signal name in journal metadataLennart Poettering
(Note that we only do this for the journal metadata, not for the xattrs, as the xattrs are only supposed to store the original 1:1 info we acquired from the kernel.)
2017-02-17coredump: fix handling of special crashesLennart Poettering
When we encounter a "special" crash we should not continue processing it the usual way.
2017-02-17resolved: try to authenticate SOA on negative repliesLennart Poettering
For caching negative replies we need the SOA TTL information. Hence, let's authenticate all auxiliary SOA RRs through DS requests on all negative requests.
2017-02-17resolved: extend various timeoutsLennart Poettering
Let's increase a number of timeouts as they apparently are too short for some real-world lookups. See: https://github.com/systemd/systemd/issues/4003#issuecomment-279842616 In particular we change the following timeouts: 1) The first UDP retry we increase 500ms → 750ms. This is a good idea, since some servers need relatively long responses for trivial lookups, and giving up our first attempt also has the effect of trying a different server for the next attempt which has the side effect that we'll run two down-grade iterations in parallel, on both servers. Hence, let's give servers a bit more time in the first iteration. 2) Permit 24 retries instead of just 16 per transactions. If we end up downgrading all the way down to UDP for a lookup we already need 5 iterations for that. If we want permit a couple of lost packages for each (let's say 4), then we already need 20 iterations. 3) Increase the overall query timeout on the service side to 60s (from 45s), simply because very long and slow DNSSEC + CNAME chains (such as us.ynuf.alipay.com) hit this boundary too easily. The client side timeout for the bus method call is increased to 90s, in order to have room for the dbus reply to go through
2017-02-17resolved: initialize all return values on successful exit of dns_cache_lookup()Lennart Poettering
Following our coding style on success we should initialize all return parameters of a function. We missed to cases for dns_cache_lookup() (but covered all others), fix them too.
2017-02-17resolved: show rcode in debug output for incoming repliesLennart Poettering
This is the most important piece of information of replies, hence show this in the first log message about it. (Wireshark shows it too in the short summary, hence this definitely makes sense...)
2017-02-17resolved: don't downgrade feature level if we get RCODE on UDP levelLennart Poettering
Retrying a transaction via TCP is a good approach for mitigating packet loss. However, it's not a good away way to fix a bad RCODE if we already downgraded to UDP level for it. Hence, don't do this. This is a small tweak only, but shortens the time we spend on downgrading when a specific domain continously returns a bad rcode.
2017-02-17resolved: cache SERVFAIL responses for 30sLennart Poettering
Some domains (such as us.ynuf.alipay.com) almost appear as if they actively want to sabotage our DNSSEC work. Specifically, they unconditionally return SERVFAIL on SOA lookups and always only after a 1s delay (at least). This is pretty bad for our validation logic, as we use SOA lookups to distuingish zones from non-terminal names. Moreover, SERVFAIL is an error that is typically returned if we send requests a server doesn't grok, and thus is reason for us to downgrade our protocol and try again. In case of these zones this means we'll accept the SERVFAIL response only after a full iterative downgrade to our lowest feature level: TCP. In combination with the 1s delays this has the effect of making us hit our transaction timeout way to easily. As first attempt to improve the situation: let's start caching SERVFAIL responses in our cache, after the full downgrade for a short period of time. Conceptually this is exposed as "weird rcode" caching, but for now we only consider SERVFAIL a "weird rcode" worthy of caching. Later on we might want to add more.
2017-02-17resolved: lengthen timeout for TCP transactionsLennart Poettering
When we are doing a TCP transaction the kernel will automatically resend all packets for us, there's no need to do that ourselves. Hence: increase the timeout for TCP transactions substantially, to give the kernel enough time to connect to the peer, without interrupting it when we become impatient.
2017-02-17resolved: when DNSSEC mode is disabled, don't go beyond EDNS0 feature levelLennart Poettering
There's no point in talking to a server in DNSSEC mode when we don't actually want to verify anything. See: #5352
2017-02-17resolved: when accepted a query candidate as final answer, propagate ↵Lennart Poettering
authentication bool even on failure Let's make sure that if we accept a query candidate, then let's also propagate the authenticated flag for it, so that we can properly report back to the clients whether lookups failed due to non-existance that can be proven.
2017-02-17resolved: propagate AD bit for NXDOMAIN into stub repliesLennart Poettering
When we managed to prove non-existance of a name, then we should properly propagate this to clients by setting the AD bit on NXDOMAIN. See: #4621
2017-02-17resolved: automatically downgrade reply bits on sendLennart Poettering
Doesn't really change anything, but makes things a bit simpler to read.
2017-02-17resolved: when the dns server feature level grace period elapses, flush cachesLennart Poettering
The cache might contain all kinds of unauthenticated data that we really shouldn't be using if we upgrade our feature level and suddenly are able to get authenticated data again. Might fix: #4866
2017-02-17resolved: fix NSEC proofs for missing TLDsLennart Poettering
For the wildcard NSEC check we need to generate an "asterisk" domain, by prepend the common ancestor with "*.". So far we did that with a simple strappenda() which is fine for most domains, but doesn't work if the common ancestor is the root domain as we usually write that as "." in normalized form, and "*." joined with "." is "*.." and not "*." as it should be. Hence, use the clean way out, let's just use dns_name_concat() which only exists precisely for this reason, to properly concatenate labels. There's a good chance this actually fixes #5029, as this NSEC proof is triggered by lookups in the TLD "example", which doesn't exist in the Internet.
2017-02-17resolved: make sure configured NTAs affect subdomains tooLennart Poettering
This ensures that configured NTAs exclude not only the listed domain but also all domains below it from DNSSEC validation -- except if a positive trust anchor is defined below (as suggested by RFC7647, section 1.1) Fixes: #5048
2017-02-17machined: refuse bind mounts on containers that have user namespaces appliedLennart Poettering
As the kernel won't map the UIDs this is simply not safe, and hence we should generate a clean error and refuse it. We can restore this feature later should a "shiftfs" become available in the kernel.
2017-02-17machined: properly propagate long-running operation errorsLennart Poettering
Actually initialize the "error" structure with the error we got
2017-02-17machined: when copying files from/to userns containers chown to rootLennart Poettering
This changes the file copy logic of machined to set the UID/GID of all copied files to 0 if the host and container do not share the same user namespace. Fixes: #4078
2017-02-17copy: change the various copy_xyz() calls to take a unified flags parameterLennart Poettering
This adds a unified "copy_flags" parameter to all copy_xyz() function calls, replacing the various boolean flags so far used. This should make many invocations more readable as it is clear what behaviour is precisely requested. This also prepares ground for adding support for more modes later on.
2017-02-17machinectl: tweak address output in "machinectl status"Lennart Poettering
With this change we'll not show an "Addresses" field for machines that we don't know any addresses for. This changes print_addresses() to never suffix its output with a newline, leaving that to the caller. That's a good idea since depending on who the caller is, different rules apply: if no addresses are found, then the list view still wants a newline, but the status view does not. This also changes the function to return the number of found addresses, which can be used to decide when to add a newline or not.
2017-02-17machined: expose "UID shift" concept for containersLennart Poettering
UID/GID mapping with userns can be arbitrarily complex. Let's break this down to a single admin-friendly parameter: let's expose the UID/GID shift of a container via a new bus call for each container, and let's show this as part of "machinectl status" if it is not 0. This should work for pretty much all real-life full OS container setups (i.e. the stuff machined is suppose to be useful for). For everything else we generate a clean error, clarifying that we can't expose the mapping.
2017-02-17resolved: default to the compile-time fallback hostnameLennart Poettering
This changes resolved to use the compile-time fallback hostname the configured one is not set. Note that if the local hostname is set to "localhost" then we'll instead default to "linux" here, as for mDNS/LLMNR exposing "localhost" is actively dangerous.
2017-02-17core: when booting up, initialize hostname to compile-time fallback hostnameLennart Poettering
When /etc/hostname isn't set, default to the configured compile-time fallback hostname instead of "localhost" for the kernel hostname.
2017-02-17hostname-util: default to the compile time default hostname in ↵Lennart Poettering
gethostname_malloc() Currently, if the hostname is not set gethostname_malloc() defaults to the "sysname", which is "linux" on Linux. Let's change that to also honour the compile-time fallback hostname as specified on the configure command line.