summaryrefslogtreecommitdiff
path: root/src/basic
AgeCommit message (Collapse)Author
2015-09-01core: unified cgroup hierarchy supportLennart Poettering
This patch set adds full support the new unified cgroup hierarchy logic of modern kernels. A new kernel command line option "systemd.unified_cgroup_hierarchy=1" is added. If specified the unified hierarchy is mounted to /sys/fs/cgroup instead of a tmpfs. No further hierarchies are mounted. The kernel command line option defaults to off. We can turn it on by default as soon as the kernel's APIs regarding this are stabilized (but even then downstream distros might want to turn this off, as this will break any tools that access cgroupfs directly). It is possibly to choose for each boot individually whether the unified or the legacy hierarchy is used. nspawn will by default provide the legacy hierarchy to containers if the host is using it, and the unified otherwise. However it is possible to run containers with the unified hierarchy on a legacy host and vice versa, by setting the $UNIFIED_CGROUP_HIERARCHY environment variable for nspawn to 1 or 0, respectively. The unified hierarchy provides reliable cgroup empty notifications for the first time, via inotify. To make use of this we maintain one manager-wide inotify fd, and each cgroup to it. This patch also removes cg_delete() which is unused now. On kernel 4.2 only the "memory" controller is compatible with the unified hierarchy, hence that's the only controller systemd exposes when booted in unified heirarchy mode. This introduces a new enum for enumerating supported controllers, plus a related enum for the mask bits mapping to it. The core is changed to make use of this everywhere. This moves PID 1 into a new "init.scope" implicit scope unit in the root slice. This is necessary since on the unified hierarchy cgroups may either contain subgroups or processes but not both. PID 1 hence has to move out of the root cgroup (strictly speaking the root cgroup is the only one where processes and subgroups are still allowed, but in order to support containers nicey, we move PID 1 into the new scope in all cases.) This new unit is also used on legacy hierarchy setups. It's actually pretty useful on all systems, as it can then be used to filter journal messages coming from PID 1, and so on. The root slice ("-.slice") is now implicitly created and started (and does not require a unit file on disk anymore), since that's where "init.scope" is located and the slice needs to be started before the scope can. To check whether we are in unified or legacy hierarchy mode we use statfs() on /sys/fs/cgroup. If the .f_type field reports tmpfs we are in legacy mode, if it reports cgroupfs we are in unified mode. This patch set carefuly makes sure that cgls and cgtop continue to work as desired. When invoking nspawn as a service it will implicitly create two subcgroups in the cgroup it is using, one to move the nspawn process into, the other to move the actual container processes into. This is done because of the requirement that cgroups may either contain processes or other subgroups.
2015-09-01Merge pull request #1107 from msekletar/selinux-get-raw-contextLennart Poettering
selinux: always use *_raw API from libselinux
2015-09-01Merge pull request #1111 from poettering/more-cgroup-fixesTom Gundersen
More cgroup fixes
2015-09-01cgroup: the root cgroup is always populatedLennart Poettering
2015-09-01cgroup: drop "ignore_self" argument from cg_is_empty()Lennart Poettering
In all cases where the function (or cg_is_empty_recursive()) ignoring the calling process is actually wrong, as a process keeps a cgroup busy regardless if its the current one or another. Hence, let's simplify things and drop the "ignore_self" parameter.
2015-09-01cgroup: small cleanups and coding style fixesLennart Poettering
A number of simplications and adjustments to brings things closer to our coding style.
2015-09-01cgroup: don't allow hidden cgroupsLennart Poettering
We really should care for all cgroups, and not allow hidden ones.
2015-09-01cgroup: never migrate kernel threads out of the root cgroupLennart Poettering
It won't work anyway.
2015-09-01Merge pull request #1108 from phomes/dont-shadow-globalsDavid Herrmann
tree-wide: do not shadow the global var timezone
2015-09-01tree-wide: do not shadow the global var timezoneThomas Hindoe Paaboel Andersen
2015-09-01set: return NULL on destructorsLennart Poettering
Like we do it pretty much everywhere else.
2015-09-01selinux: always use *_raw API from libselinuxMichal Sekletar
When mcstransd* is running non-raw functions will return translated SELinux context. Problem is that libselinux will cache this information and in the future it will return same context even though mcstransd maybe not running at that time. If you then check with such context against SELinux policy then selinux_check_access may fail depending on whether mcstransd is running or not. To workaround this problem/bug in libselinux, we should always get raw context instead. Most users will not notice because result of access check is logged only in debug mode. * SELinux context translation service, which will translates labels to human readable form
2015-08-31process-util: trivial optimizationLennart Poettering
2015-08-31util: treat 'C' and 'POSIX' locale identicalLennart Poettering
2015-08-30extract_first_word: Refactor EXTRACT_DONT_COALESCE_SEPARATORS handlingFilipe Brandenburger
Refactor allocation of the result string to the top, since it is currently done in both branches of the condition. Remove unreachable code checking for EXTRACT_DONT_COALESCE_SEPARATORS when state == SEPARATOR (the only place where SEPARATOR is assigned to state follows a check for EXTRACT_DONT_COALESCE_SEPARATORS that jumps to the end of the function.) Tested by running test-util successfully. Follow up to: 206644aedeb8859801051ac170ec562c6a113a79
2015-08-30extract_first_word: Refactor allocation in empty argument caseFilipe Brandenburger
This covers the case where an argument is an empty string, such as ''. Instead of allocating the empty string in the individual conditions when state == VALUE, just always allocate it at the end of state == START, at which point we know we will have an argument. Tested that test-util keeps passing after the refactor. Follow up to: 14e685c29d5b317b815e3e9f056648027852b07e
2015-08-30util: make malloc0 ask calloc for one block of size nThomas Hindoe Paaboel Andersen
... instead of an array of n individual bytes. Silences a lot of warnings in smatch.
2015-08-28Merge pull request #1063 from poettering/dbus-interface-from-typeTom Gundersen
cgls/cgtop: a variety of modernizations
2015-08-28Merge pull request #1061 from poettering/pagerDaniel Mack
A few auto-pager improvements
2015-08-28cgtop: major modernizationsLennart Poettering
In preparation of the unified cgroup support, let's clean up cgtop: a) rework time code to be based on "nsec_t" rather than "struct timespec" b) Introduce long option --order= for selecting ordering c) count number of processes only in the main hierarchy, don't bother with the controller hierarchies. We don't allow orthogonal hierarchies in systemd anymore, hence there's no point to check the other hierarchies. d) Deal with non-monotonic cpuacct values (see #749) e) When sorting groups, don't do prefix compare when ordering by number of tasks, since this is not accumulative for all children. f) Actually make --cpu without parameter work g) Don't output control characters when we get them as input. Fixes #749.
2015-08-28core: add unit_dbus_interface_from_type() to unit-name.hLennart Poettering
Let's add a way to get the type-specific D-Bus interface of a unit from either its type or name to src/basic/unit-name.[ch]. That way we can share it with the client side, where it is useful in tools like cgls or machinectl. Also ports over machinectl to make use of this.
2015-08-28copy: add splice() based fallbackLennart Poettering
Apparently, sendfile() does not work between fifos and ttys, but splice() does, hence let's optionally fall back to that. This is useful to implement the fallback pager this way.
2015-08-27Merge pull request #1055 from poettering/dhcp-updatesTom Gundersen
Various networkd and dhcp updates
2015-08-27tree-wide: we place the opening bracket on the same line as the function nameLennart Poettering
Let's do this everywhere the same way.
2015-08-27Revert "sd-bus: do not connect to dbus-1 socket when kdbus is available"David Herrmann
This reverts commit d4d00020d6ad855d65d31020fefa5003e1bb477f. The idea of the commit is broken and needs to be reworked. We really cannot reduce the bus-addresses to a single address. We always will have systemd with native clients and legacy clients at the same time, so we also need both addresses at the same time.
2015-08-26basic: document that people shouldn't use refcnt.h without reasonLennart Poettering
refcnt.h only exists for cases where objects are simultaneously handled by different threads. Otherwise it should not be used. The only case where this applies is sd_bus, really, and pretty much none of our APIs, since we do not claim thread-safety for them.
2015-08-26time-util: add new get_timezone() call to get local timezoneLennart Poettering
Let's move the timedated-specific code to time-util.h and make it generic.
2015-08-24machined: introduce pseudo-machine ".host" refererring to the host systemLennart Poettering
Some of the operations machined/machinectl implement are also very useful when applied to the host system (such as machinectl login, machinectl shell or machinectl status), hence introduce a pseudo-machine by the name of ".host" in machined that refers to the host system, and may be used top execute operations on the host system with. This copies the pseudo-image ".host" machined already implements for image related commands. (This commit also adds a PK privilege for opening a PTY in a container, which was previously not accessible for non-root.)
2015-08-24machined: validate machine names at more placesLennart Poettering
When enumerating machines from /run, and when accepting machine names for operations, be more strict and always validate. Note that these checks are strictly speaking unnecessary, since enumeration happens only on the trusted /run...
2015-08-24util: make machine_name_is_valid() a macro and move it to hostname-util.hLennart Poettering
As it turns out machine_name_is_valid() does the exact same thing as hostname_is_valid() these days, as it just invoked that and checked the name length was < 64. However, hostname_is_valid() checks the length against HOST_NAME_MAX anyway (which is 64 on Linux), hence any additional check is redundant. We hence replace machine_name_is_valid() by a macro that simply maps it to hostname_is_valid() but sets the allow_trailing_dot parameter to false. We also move this this call to hostname-util.h, to the same place as the hostname_is_valid() declaration.
2015-08-24util: make hostname_is_valid() easier to readLennart Poettering
Add more comments, and rename some parameters and variables to be more expressive.
2015-08-21hostname-util: introduce new is_gateway_hostname() callLennart Poettering
This moves is_gateway() from nss-myhostname into the basic APIs, and makes it more like is_localhost(). Also, we rename it to is_gateway_hostname() to make it more expressive. Sharing this function in src/basic/ allows us to reuse the function for routing name requests in resolved (in a later commit).
2015-08-17Merge pull request #977 from richardmaw-codethink/machinectl-userns-login-v2Lennart Poettering
Fix machinectl login with containers in user namespaces (v2)
2015-08-17namespace helpers: Allow entering a UID namespaceRichard Maw
To be able to use `systemd-run` or `machinectl login` on a container that is in a private user namespace, the sub-process must have entered the user namespace before connecting to the container's D-Bus, otherwise the UID and GID in the peer credentials are garbage. So we extend namespace_open and namespace_enter to support UID namespaces, and we enter the UID namespace in bus_container_connect_{socket,kernel}. namespace_open will degrade to a no-op if user namespaces are not enabled in the kernel. Special handling is required for the setns call in namespace_enter with a user namespace, since transitioning to your own namespace is forbidden, as it would result in re-entering your user namespace as root. Arguably it may be valid to check this at the call site, rather than inside namespace_enter, but it is less code to do it inside, and if the intention of calling namespace_enter is to *be* in the target namespace, rather than to transition to the target namespace, it is a reasonable approach. The check for whether the user namespace is the same must happen before entering namespaces, as we may not be able to access /proc during the intermediate transition stage. We can't instead attempt to enter the user namespace and then ignore the failure from it being the same namespace, since the error code is not distinct, and we can't compare namespaces while mid-transition.
2015-08-16Merge pull request #908 from richardmaw-codethink/nspawn-path-escapes-v3Lennart Poettering
Allow arbitrary file paths to be passed to nspawn (v3)
2015-08-11 sd-bus: do not connect to dbus-1 socket when kdbus is availableKay Sievers
We should not fall back to dbus-1 and connect to the proxy when kdbus returns an error that indicates that kdbus is running but just does not accept new connections because of quota limits or something similar. Using is_kdbus_available() in libsystemd/ requires it to move from shared/ to libsystemd/. Based on a patch from David Herrmann: https://github.com/systemd/systemd/pull/886
2015-08-07strv: Add strv_shell_escapeRichard Maw
This modifies the strv in-place, replacing strings with their escaped version. It's mostly just a convenience function for when you need to join a strv together because it's passed as a string to something, and the separator needs escaping.
2015-08-07util: Add shell_escapeRichard Maw
This is for shell-style \ escaping rather than quoting, which while it has the same effect in produced shell commands, is not exclusively useful for shell commands. shell_escape would be useful for producing sed commands, as you would be able to \ escape the normal special characters, plus whichever argument separator was chosen; or it could be used to escape arguments passed to the overlayfs mount command.
2015-08-07strv: convert strv_split_quotes into a generic strv_split_extractRichard Maw
strv_split_extract is to strv_split_quotes as extract_first_word was to unquote_first_word. Now there's extract_first_word for extracting a single argument, extract_many_words for extracting a bounded number of arguments, and strv_split_extract for extracting an arbitrary number of arguments.
2015-08-07util: Allow non-separator coalescing parsing in extract_first_wordRichard Maw
If EXTRACT_DONT_COALESCE_SEPARATORS is passed, then leading separators, trailing separators and spans of multiple separators aren't skipped, and empty arguments from before, after or between separators may be extracted.
2015-08-07util: Don't interpret quotes by default in extract_first_wordRichard Maw
This adds an EXTRACT_QUOTES option to allow the previous behaviour, of not interpreting any character inside ' or " quotes as separators.
2015-08-07util: change unquote_*_word to extract_*_wordRichard Maw
It now takes a separators argument, which defaults to WHITESPACE if NULL is passed.
2015-08-07unquote_first_word: set *p=NULL on terminationRichard Maw
To add a flag to allow an empty string to be parsed as an argument, we need to be able to distinguish between the end of the string, and after the end of the string, so when we *do* reach the end, let's set *p to this state.
2015-08-05Merge branch 'hostnamectl-dot-v2'Zbigniew Jędrzejewski-Szmek
Manual merge of https://github.com/systemd/systemd/pull/751.
2015-08-05hostname-util: ignore case when checking if hostname is localhostZbigniew Jędrzejewski-Szmek
2015-08-05hostname-util: get rid of unused parameter of hostname_cleanup()Zbigniew Jędrzejewski-Szmek
All users are now setting lowercase=false.
2015-08-05hostname-util: add relax parameter to hostname_is_validZbigniew Jędrzejewski-Szmek
Tests are modified to check behaviour with relax and without relax. New tests are added for hostname_cleanup(). Tests are moved a new file (test-hostname-util) because there's now a bunch of them. New parameter is not used anywhere, except in tests, so there should be no observable change.
2015-08-05Merge pull request #877 from crawford/dhcp-private-options-v4Lennart Poettering
networkd: save private-zone DHCP options
2015-08-04Use getxpid syscall on alpha for raw_getpid()Matt Turner
Alpha does not have a getpid syscall, but rather has getxpid to match OSF/1.
2015-08-04smack-util: revise smack-util apis and add read smack attr apisWaLyong Cho
- Add smack xattr lookup table - Unify all of mac_smack_apply_xxx{_fd}() to mac_smack_apply() and mac_smack_apply_fd(). - Add smack xattr read apis similar with apply apis as mac_smack_read{_fd}().