[CMake] cmake on multicore interix'en
Markus Duft
markus.duft at salomon.at
Wed Feb 17 11:29:26 EST 2010
Brad King wrote:
> Markus Duft wrote:
>> cmakes implementation of how child processes are handled doesn't work
>> reliably on multicore interix. it seems that every other SIGCHLD is lost
>
> Is this a known problem on that platform, independent of CMake?
it is independant of cmake, yes. it is not widely known, as (i guess)
i'm one of the very few people really _using_ this platform for
something productive (cross compiling to native win32 - hahaha - i know
- don't tell me that cmake supports win32, we have a huge bunch of auto*
based stuff that needs a POSIX env...). doing the bits to make cmake
cross compile from interix to win32 using parity (parity.sf.net) is next
on my agenda...
>
> The ProcessUNIX.c implementation is for POSIX platforms, which clearly
> define SIGCHLD semantics.
yeah - interix is (supposed to be) POSIX compliant, and hey - it works
_most of the time_. what's the cause of my headaches is the few times it
doesn't ... and all this (both of my problems) is only on multi-core
machines. i am in the process of reporting those issues currently, but
M$ support is something soo .... you know the deal ;)
>
>> somewhere on the way. i (printf-)debugged cmake a little during
>> bootstrap, and it seems that at random points in time, SIGCHLD is lost,
>
> Can you print out the state of signal masks?
how can i do that? i'm not really into that topic that much :) but i'll
read some man pages to figure it out.
>
>> and cmake locks up in a select() call on the signal pipe (SIGCHLD is
>> lost, so nobody will write on the signal pipe).
>
> The "signal pipe" approach is a standard way to implement race-free
> handling of SIGCHLD while blocking in select().
>
>> i thought of introducing some lame timeout when select()ing the signal
>> pipe, then checking whether the process is still alive (wait()), and
>> again selecting if it is. what do you think?
>
> If select() is broken (your second problem) then there is no point
> in pursuing this code path further. Instead modify the polling
> code path to use a non-blocking waitpid() instead of looking at
> the signal pipe.
it seems that i'm not hit by the select problem, as there is already a
"select has lied" path somewhere in that code path that catches exactly
my select() problem.
but yes, maybe it would be easier to implement the waitpid() stuff in
the non-blocking code path. i'll have a look at that.
>
>> the second problem i have is regarding a broken select(). i tried to
>> work around it by setting KWSYSPE_USE_SELECT, which initially didn't
>> work, because the code seems b0rked. it seems that there is a wrong
>> timeout check in that code path.
>
> IIRC that path was contributed for BeOS support which AFAIK is not
> really tested anymore. However, it looks correct at a quick glance.
>
>> first kwsysProcessGetTimeoutLeft is
>> called, like in the select() code path, but directly after that, the
>> timeoutLength members are checkd seperately once more.
>
> The call to GetTimeoutLeft fills the members of timeoutLength.
> It also returns whether or not the timeout has already expired.
> The caller is supposed to use timeoutLength after the call.
>
>> with this check it seems that all sub-processes "time out" immediately.
>
> At process start time we store an absolute TimeoutTime using the
> starting wall clock time plus the process timeout length. Later
> the GetTimeoutLeft subtracts the current time from the TimeoutTime.
> Print out the starting time, the computed TimeoutTime, and the
> timeoutLength that gets computed for each poll.
i'll have a look at that one too. thanks for all the work :)
Cheers, Markus
>
> -Brad
More information about the CMake
mailing list