Age | Commit message (Collapse) | Author |
|
The test-barrier binary uses real-time alarms and timeouts to test for
races in the thread-barrier implementation. Hence, if your system is under
high load and your scheduler decides to not run test-barrier for
>BASE_TIME, then the tests are likely to fail.
Two options:
1) Increase BASE_TIME. This will make the test take significantly longer
for no real good. Furthermore, it is still not guaranteed that the
task is scheduled.
2) Don't rely on real-time timers, but use explicit synchronization. This
would basically test one barrier implementation with another.. kinda
ironic.. but maybe something worth looking into.
3) Disable test-barrier by default.
This patch chooses option 3) and makes sure test-barrier only runs if you
pass any argument.
Side note:
test-barrier is written in a way that if it fails under load, but
does not on idle systems, then it is very _unlikely_ that the
barrier implementation is the culprit. Hence, it makes little
sense to run it under load, anyway. It will not improve the test
coverage of barrier.c, but rather the coverage of the test itself.
|
|
This patch removes includes that are not used. The removals were found with
include-what-you-use which checks if any of the symbols from a header is
in use.
|
|
Coverity seems to think that we can later end up with the "them"
fd having a negative value. Even after a succesful barrier_create.
Add some test to verify that the constructor went well. If coverity
still complains then it must mean that it thinks the the value is
overwritten later.
|
|
The barrier_wait_next_twice* test-cases run:
Parent: Child:
set_alarm(10) sleep_for(1);
... set_alarm(1);
sleep_for(2) ...
Therefore, the parent exits after 2+ periods, the client's alarm fires
after 2+ periods. This race turns out to be lost by the child on other
machines, so avoid it by increasing the parent's sleep-interval to 4. This
way, the client has 2 periods to run the barrier test, which is far more
than enough.
|
|
Avoid using msecs in favor of usec_t. This is more consistent with the
other parts of systemd and avoids the confusion between msec and usec. We
always use usecs, end of story.
|
|
Explicitly initalize descriptors using explicit assignment like
bus_error. This makes barriers follow the same conventions as
everything else and makes things a bit simpler too.
Rename barier_init to barier_create so it is obvious that it is
not about initialization.
Remove some parens, etc.
|
|
The "Barrier" object is a simple inter-process barrier implementation. It
allows placing synchronization points and waiting for the other side to
reach it. Additionally, it has an abortion-mechanism as second-layer
synchronization to send abortion-events asynchronously to the other side.
The API is usually used to synchronize processes during fork(). However,
it can be extended to pass state through execve() so you could synchronize
beyond execve().
Usually, it's used like this (error-handling replaced by assert() for
simplicity):
Barrier b;
r = barrier_init(&b);
assert_se(r >= 0);
pid = fork();
assert_se(pid >= 0);
if (pid == 0) {
barrier_set_role(&b, BARRIER_CHILD);
...do child post-setup...
if (CHILD_SETUP_FAILED)
exit(1);
...child setup done...
barrier_place(&b);
if (!barrier_sync(&b)) {
/* parent setup failed */
exit(1);
}
barrier_destroy(&b); /* redundant as execve() and exit() imply this */
/* parent & child setup successful */
execve(...);
}
barrier_set_role(&b, BARRIER_PARENT);
...do parent post-setup...
if (PARENT_SETUP_FAILED) {
barrier_abort(&b); /* send abortion event */
barrier_wait_abortion(&b); /* wait for child to abort (exit() implies abortion) */
barrier_destroy(&b);
...bail out...
}
...parent setup done...
barrier_place(&b);
if (!barrier_sync(&b)) {
...child setup failed... ;
barrier_destroy(&b);
...bail out...
}
barrier_destroy(&b);
...child setup successfull...
This is the most basic API. Using barrier_place() to place barriers and
barrier_sync() to perform a full synchronization between both processes.
barrier_abort() places an abortion barrier which superceeds any other
barriers, exit() (or barrier_destroy()) places an abortion-barrier that
queues behind existing barriers (thus *not* replacing existing barriers
unlike barrier_abort()).
This example uses hard-synchronization with wait_abortion(), sync() and
friends. These are all optional. Barriers are highly dynamic and can be
used for one-way synchronization or even no synchronization at all
(postponing it for later). The sync() call performs a full two-way
synchronization.
The API is documented and should be fairly self-explanatory. A test-suite
shows some special semantics regarding abortion, wait_next() and exit().
Internally, barriers use two eventfds and a pipe. The pipe is used to
detect exit()s of the remote side as eventfds do not allow that. The
eventfds are used to place barriers, one for each side. Barriers itself
are numbered, but the numbers are reused once both sides reached the same
barrier, thus you cannot address barriers by the index. Moreover, the
numbering is implicit and we only store a counter. This makes the
implementation itself very lightweight, which is probably negligible
considering that we need 3 FDs for a barrier..
Last but not least: This barrier implementation is quite heavy. It's
definitely not meant for fast IPC synchronization. However, it's very easy
to use. And given the *HUGE* overhead of fork(), the barrier-overhead
should be negligible.
|