summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2015-06-17Merge pull request #240 from kaysievers/wipKay Sievers
build-sys: hide magic section variables from exported symbols
2015-06-17Merge pull request #238 from dvdhrm/udev-epollKay Sievers
udev: don't close FDs before dropping them from epoll
2015-06-17build-sys: hide magic section variables from exported symbolsKay Sievers
https://github.com/systemd/systemd/issues/234
2015-06-17udev: don't close FDs before dropping them from epollDavid Herrmann
Make sure we never close fds before we drop their related event-source. This will cause horrible disruptions if the fd-num is re-used by someone else. Under normal conditions, this should not cause any problems as the close() will drop the fd from the epoll-set automatically. However, this changes if you have any child processes with a copy of that fd. This fixes issue #163. Background: If you create an epoll-set via epoll_create() (lets call it 'EFD') you can add file-descriptors to it to watch for events. Whenever you call EPOLL_CTL_ADD on a file-descriptor you want to watch, the kernel looks up the attached "struct file" pointer, that this FD refers to. This combination of the FD-number and the "struct file" pointer is used as key to link it into the epoll-set (EFD). This means, if you duplicate your file-descriptor, you can watch this file-descriptor, too (because the duplicate will have a different FD-number, hence, the combination of FD-number and "struct file" is different as before). If you want to stop watching an FD, you use EPOLL_CTL_DEL and pass the FD to the kernel. The kernel again looks up your file-descriptor in your FD-table to find the linked "struct file". This FD-number and "struct file" combination is then dropped from the epoll-set (EFD). Last, but not least: If you close a file-descriptor that is linked to an epoll-set, the kernel does *NOTHING* regarding the epoll-set. This is a vital observation! Because this means, your epoll_wait() calls will still return the metadata you used to watch/subscribe your file-descriptor to events. There is one exception to this rule: If the file-descriptor that you just close()ed was the last FD that referred to the underlying "struct file", then _all_ epoll-set watches/subscriptions are destroyed. Hence, if you never dup()ed your FD, then a simple close() will also unsubscribe it from any epoll-set. With this in mind, lets look at fork(): Assume you have an epoll-set (EFD) and a bunch of FDs subscribed to events on that EFD. If you now call fork(), the new process gets a copy of your file-descriptor table. This means, the whole table is copied and the "struct file" reference of each FD is increased by 1. It is important to notice that the FD-numbers in the child are exactly the same as in the parent (eg., FD #5 in the child refers to the same "struct file" as FD #5 in the parent). This means, if the child calls EPOLL_CTL_DEL on an FD, the kernel will look up the linked "struct file" and drop the FD-number and "struct file" combination from the epoll-set (EFD). However, this will effectively drop the subscription that was installed by the parent. To sum up: even though the child gets a duplicate of the EFD and all FDs, the subscriptions in the EFD are *NOT* duplicated! Now, with this in mind, lets look at what udevd does: Udevd has a bunch of file-descriptors that it watches in its sd-event main-loop. Whenever a uevent is received, the event is dispatched on its workers. If no suitable worker is present, a new worker is fork()ed to handle the event. Inside of this worker, we try to free all resources we inherited. However, the fork() call is done from a call-stack that is never rewinded. Therefore, this call stack might own references that it drops once it is left. Those references we cannot deduce from the fork()'ed process; effectively causing us to leak objects in the worker (eg., the call to sd_event_dispatch() that dispatched our uevent owns a reference to the sd_event object it used; and drops it again once the function is left). (Another example is udev_monitor_ref() for each 'worker' that is also inherited by all children; thus keeping the udev-monitor and the uevent-fd alive in all children (which is the real cause for bug #163)) (The extreme variant is sd_event_source_unref(), which explicitly keeps event-sources alive, if they're currently dispatched, knowing that the dispatcher will free the event once done. But if the dispatcher is in the parent, the child will never ever free that object, thus leaking it) This is usually not an issue. However, if such an object has a file-descriptor embedded, this FD is left open and never closed in the child. In manager_exit(), if we now destroy an object (i.e., close its embedded file-descriptor) before we destroy its related sd_event_source, then sd-event will not be able to drop the FD from the epoll-set (EFD). This is, because the FD is no longer valid at the time we call EPOLL_CTL_DEL. Hence, the kernel cannot figure out the linked "struct file" and thus cannot remove the FD-number plus "struct file" combination; effectively leaving the subscription in the epoll-set. Since we leak the uevent-fd in the children, they retain a copy of the FD pointing to the same "struct file". Thus, the EFD-subscription are not automatically removed by close() (as described above). Therefore, the main daemon will still get its metadata back on epoll_watch() whenever an event occurs (even though it already freed the metadata). This then causes the free-after-use bug described in #163. This patch fixes the order in which we destruct objects and related sd-event-sources. Some open questions remain: * Why does source_io_unregister() not warn on EPOLL_CTL_DEL failures? This really needs to be turned into an assert_return(). * udevd really should not leak file-descriptors into its children. Fixing this would *not* have prevented this bug, though (since the child-setup is still async). It's non-trivial to fix this, though. The stack-context of the caller cannot be rewinded, so we cannot figure out temporary refs. Maybe it's time to exec() the udev-workers? * Why does the kernel not copy FD-subscriptions across fork()? Or at least drop subscriptions if you close() your FD (it uses the FD-number as key, so it better subscribe to it)? Or it better used FD+"struct file_table*"+"struct file*" as key to not allow the childen to share the subscription table.. *sigh* Seems like we have to live with that API forever.
2015-06-16Merge pull request #231 from tixxdz/nspawn-userns-fixes-2Lennart Poettering
nspawn: check if kernel supports userns as early as possible
2015-06-16nspawn: check if kernel supports userns as early as possibleDjalal Harouni
If the kernel do not support user namespace then one of the children created by nspawn parent will fail at clone(CLONE_NEWUSER) with the generic error EINVAL and without logging the error. At the same time the parent may also try to setup the user namespace and will fail with another error. To improve this, check if the kernel supports user namespace as early as possible.
2015-06-16tmpfiles: silently ignore failed removal of btrfs submount from non-dirTom Gundersen
This fixes: Jun 16 16:00:20 tomegun-x2402 systemd-tmpfiles[233]: rm_rf(/var/lib/machines/.#fedora.lck): Not a directory Jun 16 16:00:20 tomegun-x2402 systemd-tmpfiles[233]: rm_rf(/var/lib/machines/.#Fedora-Cloud-Base-20141203-21.x86_64.raw.lck): Not a directory
2015-06-16Merge pull request #197 from dvdhrm/hashmapMichal Schmidt
hashmap: fix iterators to not skip entries
2015-06-16Merge pull request #223 from ronnychevalier/rc/warning_va_startDavid Herrmann
signal-util: fix incorrect argument of va_start
2015-06-16Merge pull request #222 from utezduyar/mem-leak-on-bus-errorDaniel Mack
sd-bus: use proper cleanup macro
2015-06-16signal-util: fix incorrect argument of va_startRonny Chevalier
The last argument of the function before the vargs is "old" not "how". warning: second parameter of ‘va_start’ not last named argument
2015-06-16sd-bus: use proper cleanup macroUmut Tezduyar Lindskog
2015-06-16Merge pull request #218 from poettering/dual-timestamp-nullDaniel Mack
everywhere: actually make use of DUAL_TIMESTAMP_NULL macro
2015-06-16Merge pull request #219 from poettering/logind-dockedDaniel Mack
logind: expose "Docked" bool as property on the bus
2015-06-16logind: cast close() call to (void)Lennart Poettering
2015-06-16logind: expose "Docked" bool as property on the busLennart Poettering
We know the state anyway, let's expose it in the bus. It's useful for debugging at least, but it might be useful for DEs too.
2015-06-16everywhere: actually make use of DUAL_TIMESTAMP_NULL macroLennart Poettering
Let's use it as initializer where appropriate.
2015-06-15Merge pull request #214 from poettering/signal-rework-2Lennart Poettering
everywhere: port everything to sigprocmask_many() and friends
2015-06-15everywhere: port everything to sigprocmask_many() and friendsLennart Poettering
This ports a lot of manual code over to sigprocmask_many() and friends. Also, we now consistly check for sigprocmask() failures with assert_se(), since the call cannot realistically fail unless there's a programming error. Also encloses a few sd_event_add_signal() calls with (void) when we ignore the return values for it knowingly.
2015-06-15tmpfiles: automatically remove old machine snapshots at bootLennart Poettering
Remove old temporary snapshots, but only at boot. Ideally we'd have "self-destroying" btrfs snapshots that go away if the last last reference to it does. To mimic a scheme like this at least remove the old snapshots on fresh boots, where we know they cannot be referenced anymore. Note that we actually remove all temporary files in /var/lib/machines/ at boot, which should be safe since the directory has defined semantics. In the root directory (where systemd-nspawn --ephemeral places snapshots) we are more strict, to avoid removing unrelated temporary files. This also splits out nspawn/container related tmpfiles bits into a new tmpfiles snippet to systemd-nspawn.conf
2015-06-15tmpfiles: make sure "R" lines also remove subvolumesLennart Poettering
2015-06-15util: when creating temporary file names, allow including extra id string in itLennart Poettering
This adds a "char *extra" parameter to tempfn_xxxxxx(), tempfn_random(), tempfn_ranomd_child(). If non-NULL this string is included in the middle of the newly created file name. This is useful for being able to distuingish the kind of temporary file when we see one. This also adds tests for the three call. For now, we don't make use of this at all, but port all users over.
2015-06-15btrfs-util: when snapshotting make sure we don't descent into subvolumes we ↵Lennart Poettering
just created We already had a safety check in place that we don't end up descending to the original subvolume again, but we also should avoid descending in the newly created one. This is particularly important if we make a snapshot below its source, like we do in "systemd-nspawn --ephemeral -D /". Closes https://bugs.freedesktop.org/show_bug.cgi?id=90803
2015-06-15firewall: rename fw-util.[ch] → firewall-util.[ch]Daniel Mack
The names fw-util.[ch] are too ambiguous, better rename the files to firewall-util.[ch]. Also rename the test accordingly.
2015-06-15Merge pull request #180 from ronnychevalier/rc/coverity_cid_1304686Lennart Poettering
login: fix potential null pointer dereference
2015-06-15Merge pull request #205 from endocode/iaguis/seccomp-v2Lennart Poettering
nspawn: make seccomp loading errors non-fatal
2015-06-15nspawn: make seccomp loading errors non-fatalIago López Galeiras
seccomp_load returns -EINVAL when seccomp support is not enabled in the kernel [1]. This should be a debug log, not an error that interrupts nspawn. If the seccomp filter can't be set and audit is enabled, the user will get an error message anyway. [1]: http://man7.org/linux/man-pages/man2/prctl.2.html
2015-06-15login: fix potential null pointer dereferenceRonny Chevalier
Fix CID 1304686: Dereference after null check (FORWARD_NULL) However, this commit does not fix any bug in logind. It helps to keep the elect_display_compare() function generic.
2015-06-14Merge pull request #144 from teg/udev-spawn-log-less-2Kay Sievers
udevd: event - don't log about failures of spawn processes when this …
2015-06-14Merge pull request #196 from dvdhrm/bus-map-propsTom Gundersen
tree-wide: fix memory leaks in users of bus_map_all_properties()
2015-06-14hashmap: fix iterators to not skip entriesDavid Herrmann
Currently, the HASHMAP iterators stop at the first NULL entry in a hashmap. This is non-obvious and breaks users like sd-device, which legitimately store NULL values in a hashmap. Fix all the iterators by taking a pointer to the value storage, instead of returning it. The iterators now return a boolean that tells whether the end of the list was reached. Current users of HASHMAP_FOREACH() are *NOT* changed to explicitly check for NULL. If it turns out, there were users that inserted NULL into hashmaps, but didn't properly check for it during iteration, then we really want to find those and fix them.
2015-06-14tree-wide: fix memory leaks in users of bus_map_all_properties()David Herrmann
If you use bus_map_all_properties(), you must be aware that it might touch output variables even though it may fail. This is, because we parse many different bus-properties and cannot tell how to clean them up, in case we fail deep down in the parser. Fix all callers of bus_map_all_properties() to correctly cleanup any context structures at all times.
2015-06-14test-netlink-manual: typo fixThomas Hindoe Paaboel Andersen
No functional change, but looked weird.
2015-06-14Merge pull request #183 from ssahani/netDavid Herrmann
Improve tun/tap logging by using the new log_*errno*() functions that set 'errno' explicitly. Also fix a bunch of incorrect errno/r confusions.
2015-06-14Merge pull request #189 from teg/rtnl-renameDavid Herrmann
Rename sd_rtnl to sd_netlink to prepare for further netlink-protocol support. Anything rtnl specific still uses the sd_rtnl prefix, but the generic parts (including the bus and message objects) are now called sd_netlink.
2015-06-14networkd: tuntap improve loggingSusant Sahani
Replaces strerror() usage with log_netdev_error_errno()
2015-06-13sd-netlink: socket - move some functions from main source fileTom Gundersen
2015-06-13sd-netlink: message - split up source fileTom Gundersen
Split netlink-socket.c and rtnl-message.c from netlink-message.c.
2015-06-13sd-netlink: drop the write-queueTom Gundersen
AF_NETLINK is not write-buffered, so this was actually never used.
2015-06-13sd-netlink: rename from sd-rtnlTom Gundersen
2015-06-12Merge pull request #167 from keszybz/line-oriented-ima-setupkeszybz
ima-setup: write policy one line at a time
2015-06-12selinux: whitespace fixesLennart Poettering
2015-06-12Merge pull request #173 from mischief/ipforwarding-3Lennart Poettering
IPForwarding=kernel v3
2015-06-12core: fix CID 996302Susant Sahani
CID 996302: Error handling issues (CHECKED_RETURN)
2015-06-11networkd: create "kernel" setting for IPForwardingNick Owens
In 5a8bcb674f71a20e95df55319b34c556638378ce, IPForwarding was introduced to set forwarding flags on interfaces in .network files. networkd sets forwarding options regardless of the previous setting, even if it was set by e.g. sysctl. This commit creates a new option for IPForwarding, "kernel", that preserves the sysctl settings rather than always setting them. See https://bugs.freedesktop.org/show_bug.cgi?id=89509 for the initial bug report.
2015-06-11Merge pull request #171 from teg/rtnl-broadcast-2David Herrmann
sd-rtnl: make joining broadcast groups implicit
2015-06-11sd-rtnl: make joining broadcast groups implicitTom Gundersen
2015-06-11Merge pull request #143 from teg/networkd-packets-per-slave-modeLennart Poettering
networkd: bond - only set packets_per_slave on balance-rr mode
2015-06-11Merge pull request #156 from filbranden/journal_leading_whitespaceLennart Poettering
journald: do not strip leading whitespace from messages
2015-06-11kmod-setup: don't print warning on -ENOSYSDaniel Mack
-ENOSYS is returned from kmod_module_probe_insert_module() if a module isn't available, not -ENOENT. Don't spit out a warning in that case unless the warn_if_unavailable flag is set. Also factor out the condition into an own variable for better readability.