Age | Commit message (Collapse) | Author |
|
This code is in a confusingly bad shape. Kill a bunch of global variables
and pass needed variables around in function calls.
|
|
Retrieve the handle to procfs in main(), and pass it functions
that need it. Kill the global variables.
Also, refactor lots of code in svg_title(). There's no need to access any
global variables from there either, and we really should return proper
errors from there as well.
|
|
Don't blindly exit() from random functions, but return a proper error
and upchain error conditions.
squash! bootchart: clean up control flow logic
When pread() returns "0", it's a read failure, so don't make the caller think
log_sample() was successful, return meaningful error code instead of 0.
|
|
Let the helper functions take care of the string message output.
|
|
Entropy Graph code doesn't handle the error condition if open() of /proc entry
fails. Moreover, the file is only opened once and only first sample will contain
the correct value because the return value of pread() is also not handled
properly and file is not re-opened. Fix both problems.
|
|
Correctly handle the potential failure of fdopen() (because of OOM, for instance)
after potentially successful open(). Prevent leaking open fd in such case.
|
|
If the kernel has no CONFIG_SCHED_DEBUG option set, systemd-bootchart produces
empty .svg file. The reason for this is very fragile file descriptor logic in
log_sample() and main() (/* do some cleanup, close fd's */ block). There are
many places where file descriptors are closed on failure (missing SCHED_DEBUG
provokes it), but there are several problems with it:
- following iterations in the loop see that the descriptor is non zero and do
not open the corresponding file again;
- "some cleanup" code closes already closed files and the descriptors are reused
already, in particular for resulting .svg file;
- static "vmstat" and "schedstat" variables in log_sample() made the situation
even worse.
These are the strace fragments:
[...]
close(7) = -1 EBADF (Bad file descriptor)
close(-1) = -1 EBADF (Bad file descriptor)
pread(7, 0xbea60a2c, 4095, 0) = -1 EBADF (Bad file descriptor)
close(7) = -1 EBADF (Bad file descriptor)
close(-1) = -1 EBADF (Bad file descriptor)
pread(7, 0xbea60a2c, 4095, 0) = -1 EBADF (Bad file descriptor)
close(7) = -1 EBADF (Bad file descriptor)
close(-1) = -1 EBADF (Bad file descriptor)
getdents64(4, /* 0 entries */, 32768) = 0
clock_gettime(CLOCK_MONOTONIC, {24, 783843501}) = 0
nanosleep({0, 5221792}, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {24, 789726835}) = 0
lseek(4, 0, SEEK_SET) = 0
pread(5, "nr_free_pages 52309\nnr_alloc_bat"..., 4095, 0) = 685
pread(6, "version 15\ntimestamp 4294939775\n"..., 4095, 0) = 86
getdents64(4, /* 99 entries */, 32768) = 2680
pread(7, 0xbea60a2c, 4095, 0) = -1 EBADF (Bad file descriptor)
close(7) = -1 EBADF (Bad file descriptor)
close(-1) = -1 EBADF (Bad file descriptor)
pread(8, 0xbea60a2c, 4095, 0) = -1 EBADF (Bad file descriptor)
close(8) = -1 EBADF (Bad file descriptor)
close(-1) = -1 EBADF (Bad file descriptor)
pread(9, 0xbea60a2c, 4095, 0) = -1 EBADF (Bad file descriptor)
close(9) = -1 EBADF (Bad file descriptor)
[...]
where it obviously tries to close same and reused decriptors many times, also
passing return code "-1" instead of descriptor...
[...]
close(7) = -1 EBADF (Bad file descriptor)
close(-1) = -1 EBADF (Bad file descriptor)
pipe2([7, 8], O_CLOEXEC) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb6fd0068) = 192
close(8) = 0
fcntl64(7, F_SETFD, 0) = 0
fstat64(7, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb6fd2000
read(7, "[ 0.074507] calling vfp_init"..., 4096) = 4096
[...]
read(7, "s)\n[ 6.228910] UBIFS: reserve"..., 4096) = 4096
read(7, "trary Executable File Formats Fi"..., 4096) = 1616
read(7, "", 4096) = 0
close(7) = 0
wait4(192, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 192
|
|
Reported-by: tfirg_ on IRC
|
|
In Debian and rawhide Fedora, which have CONFIG_SCHEDSTATS=n,
bootchart creates empty files in /run/log before printing an error.
Stop doing that.
Moreover this duplicated part of the code doesn't even have error checking
so there is no error avoided by doing this early.
Reported-by: tfirg_ on IRC
|
|
|
|
|
|
For daemons which have a main configuration file, there's
little reason for the administrator to use configuration snippets.
They are useful for packagers which need to override settings, but
we shouldn't advertise that as the main way of configuring those
services.
https://bugs.freedesktop.org/show_bug.cgi?id=89397
|
|
If we have less samples than expected, systemd-bootchart will crash.
|
|
Commit 6e1bf7ab99 used the wrong directory; we need rootlibexecdir, not
rootlibdir, as the latter is something like /lib/x86_64-linux-gnu/ on
multi-arch systems.
https://launchpad.net/bugs/1423867
|
|
This patch removes includes that are not used. The removals were found with
include-what-you-use which checks if any of the symbols from a header is
in use.
|
|
|
|
When booting with systemd-bootchart, default to call the systemd binary
rather than the init binary on disk, which might be another init system.
Collecting data only works with booting systemd.
|
|
|
|
|
|
If the format string contains %m, clearly errno must have a meaningful
value, so we might as well use log_*_errno to have ERRNO= logged.
Using:
find . -name '*.[ch]' | xargs sed -r -i -e \
's/log_(debug|info|notice|warning|error|emergency)\((".*%m.*")/log_\1_errno(errno, \2/'
Plus some whitespace, linewrap, and indent adjustments.
|
|
Basically:
find . -name '*.[ch]' | while read f; do perl -i.mmm -e \
'local $/;
local $_=<>;
s/log_(debug|info|notice|warning|error|emergency)\("([^"]*)%s"([^;]*),\s*strerror\(-?([->a-zA-Z_]+)\)\);/log_\1_errno(\4, "\2%m"\3);/gms;print;' \
$f; done
Plus manual indentation fixups.
|
|
It corrrectly handles both positive and negative errno values.
|
|
As a followup to 086891e5c1 "log: add an "error" parameter to all
low-level logging calls and intrdouce log_error_errno() as log calls
that take error numbers", use sed to convert the simple cases to use
the new macros:
find . -name '*.[ch]' | xargs sed -r -i -e \
's/log_(debug|info|notice|warning|error|emergency)\("(.*)%s"(.*), strerror\(-([a-zA-Z_]+)\)\);/log_\1_errno(-\4, "\2%m"\3);/'
Multi-line log_*() invocations are not covered.
And we also should add log_unit_*_errno().
|
|
Spotted with coverity. If parsing both /etc/os-release and
/usr/lib/os-release fails then null would be passed on. The calls
to parse the two files are allowed to fail. A empty /etc may not
have had the /etc/os-release symlink restored yet and we just
try again in the loop. If for whatever reason that does not happen
then we now pass on 'n/a' instead of null.
|
|
Cpu's are assigned normally, so starting at 0, so the MAX_CPU index will
always be one smaller than the actual number.
Found with Coverity.
|
|
Found by coverity. Fixes: CID#996314 and #996312
|
|
Found by coverity. Fixes: CID#996409
|
|
Coverity warned that we have already dereferenced ps->sample before
null-checking it. I suspect that's not really the issue and that
the check is checking the wrong variable.
Likely the oom-check should be on the just allocated ps->sample->next.
Found by coverity. Fixes: CID#1237765
|
|
|
|
|
|
|
|
|
|
getopt is usually good at printing out a nice error message when
commandline options are invalid. It distinguishes between an unknown
option and a known option with a missing arg. It is better to let it
do its job and not use opterr=0 unless we actually want to suppress
messages. So remove opterr=0 in the few places where it wasn't really
useful.
When an error in options is encountered, we should not print a lengthy
help() and overwhelm the user, when we know precisely what is wrong
with the commandline. In addition, since help() prints to stdout, it
should not be used except when requested with -h or --help.
Also, simplify things here and there.
|
|
We always read system uptime before log start time. So the uptime
should be always smaller number, except it includes system suspend
time. It seems better to ask for --rel and exit() than try to be
smart and try to recovery from this situation or generate huge
messy graphs.
|
|
* systemd-bootchart always parses /proc/uptime, although the
information is unnecessary when --rel specified
* use /proc/uptime is overkill, since Linux 2.6.39 we have
clock_gettime(CLOCK_BOOTTIME, ...). The backend on kernel side is
get_monotonic_boottime() in both cases.
* main() uses "if (graph_start <= 0.0)" to detect that /proc is
available.
This is fragile solution as graph_start is always smaller than zero
on all systems after suspend/resume (e.g. laptops), because in this
case the system uptime includes suspend time and uptime is always
greater number than monotonic time. For example right now difference
between uptime and monotonic time is 37 hours on my laptop.
Note that main() calls log_uptime() (to parse /proc/uptime) for each
sample when it believes that /proc is not available. So on my laptop
systemd-boochars spends all live with /proc/uptime parsing +
nanosleep(), try
strace /usr/lib/systemd/systemd-bootchart
to see the never ending loop.
This patch uses access("/proc/vmstat", F_OK) to detect procfs.
|
|
In practice this shouldn't make much difference, but
sometimes our headers might be newer, and we want to
test them.
|
|
|
|
Special care is needed so that we get an error message if the
file failed to parse, but not when it is missing. To avoid duplicating
the same error check in every caller, add an additional 'warn' boolean
to tell config_parse whether a message should be issued.
This makes things both shorter and more robust wrt. to error reporting.
|
|
|
|
since 376cd3b89c62f580a6f576cecfbbb28d3944118f LIST_FIND_TAIL accepts
an empty list. That removed an assert in LIST_FIND_TAIL and we now
theoretically risk a null pointer deref. This adds the assert directly
to protect against that.
|
|
In programs like eog and gimp the transparant background did not
look very good.
Similar fix from the one done in systemd-analyze (418e3750)
|
|
The file should have been in /usr/lib/ in the first place, since it
describes the OS container in /usr (and not the configuration in /etc),
hence, let's support os-release files in /usr/lib as fallback if no
version in /etc exists, following the usual override logic.
A prior commit already enabled tmpfiles to create /etc/os-release as a
symlink to /usr/lib/os-release should it be missing, thus providing nice
compatibility with applications only checking in /etc.
While it's probably a good idea if all apps check both locations via a
fallback logic, it is only necessary in the early boot process, as long
as the /etc/os-release symlink has not been restored, in case we boot
with an empty /etc.
|
|
|
|
|
|
|
|
As pointed-out by clang -Wunreachable-code.
No behaviour changes.
|
|
|
|
|
|
In trying to track down a stupid linker bug, I noticed a bunch of
memset() calls that should be using memzero() to make it more "obvious"
that the options are correct (i.e. 0 is not the length, but the data to
set). So fix up all current calls to memset(foo, 0, length) to
memzero(foo, length).
|
|
Directly before the changed line there is:
while ((parent->next_ps && parent->pid != ps->ppid))
parent = parent->next_ps;
which looks one element ahead of the list, hence we can rely on parent
being non null here.
If 'parent' were NULL at that while loop already, it would crash as we're
dereferencing 'parent' when checking for next_ps already.
Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
|