[CMake] Parallel build & test problem

Michael Hertling mhertling at online.de
Tue May 31 13:43:04 EDT 2011


On 05/30/2011 09:30 AM, Marcel Loose wrote:
>> Look at the following project for an
> example:                                                                
>         
>>                                                                       
>                                                
>> #
> CMakeLists.txt:                                                         
>                                            
>> CMAKE_MINIMUM_REQUIRED(VERSION 2.8
> FATAL_ERROR)                                                            
>           
>> PROJECT(PARALLEL
> C)                                                                      
>                             
>> FILE(WRITE ${CMAKE_BINARY_DIR}/generate.txt "0
> \n")                                                                    
>>
> ADD_CUSTOM_COMMAND(                                                     
>                                              
>>     OUTPUT
> ${CMAKE_BINARY_DIR}/generate.txt                                        
>                                   
>>     COMMAND
> ${CMAKE_COMMAND}                                                        
>                                  
>>     -DGENERATE=
> ${CMAKE_BINARY_DIR}/generate.txt                                        
>                               
>>     -P
> ${CMAKE_SOURCE_DIR}/generate.cmake)                                     
>                                       
>> FILE(WRITE ${CMAKE_BINARY_DIR}/f.c "void
> f(void){}\n")                                                           
>     
>> FILE(WRITE ${CMAKE_BINARY_DIR}/g.c "void
> g(void){}\n")                                                           
>     
>> ADD_LIBRARY(f SHARED f.c
> ${CMAKE_BINARY_DIR}/generate.txt)                                       
>                     
>> ADD_LIBRARY(g SHARED g.c
> ${CMAKE_BINARY_DIR}/generate.txt)                                       
>                     
>>                                                                       
>                                                
>> #
> generate.cmake:                                                         
>                                            
>> IF(EXISTS
> ${GENERATE})                                                            
>                                    
>>   FILE(STRINGS ${GENERATE}
> VAR)                                                                    
>                   
>>   MATH(EXPR VAR
> ${VAR}+1)                                                               
>                              
>>   FILE(WRITE ${GENERATE}
> "${VAR}\n")                                                             
>                     
>>
> ELSE()                                                                  
>                                              
>>   FILE(WRITE ${GENERATE} "1
> \n")                                                                    
>                   
>>
> ENDIF()                                                                 
>                                              
>>                                                                       
>                                                
>> After configuring, enter the following
> command:                                                                
>       
>>
>> while true; do
>> (make clean; make -j1) | grep "generate\.txt";
>> echo "generate.txt: $(cat generate.txt)";
>> done
>>
>> You'll endlessly see "... Generating generate.txt" followed by
>> "generate.txt: 1" which is expected. Now, switch to parallel:
>>
>> while true; do
>> (make clean; make -j2) | grep "generate\.txt";
>> echo "generate.txt: $(cat generate.txt)";
>> done
>>
>> On my system, findings are:
>>
>> 1. Two messages "... Generating generate.txt". This means that both
>> Make processes run the custom command, i.e. none of these processes
>> finds the generated.txt file already generated by the other process.
>> 2. The generated.txt file's content varies between 1 and 2: If the
>> IF(EXISTS) command of one process is executed after the FILE(WRITE)
>> command of the other process, the result in generate.txt will be 2,
>> but if the one's IF(EXISTS) command is executed between the other's
>> IF(EXISTS) and FILE(WRITE) commands, both processes will find the
>> generate.txt file absent and write it with a content of 1. That's
>> a typical race condition among the Make processes with j2 or more.
>> 3. If I replace '| grep "generate\.txt"' with '> /dev/null' in the
>> above-noted command, the following error occurs from time to time:
>>
>>> CMake Error at .../generate.cmake:3 (MATH):
>>>   math cannot parse the expression: "+1": syntax error, unexpected
> exp_PLUS,
>>>   expecting exp_OPENPARENT or exp_NUMBER (1)
>>
>> Supposedly, the reason is that the FILE(STRINGS) command of one
> process
>> is executed during the other's FILE(WRITE) command, i.e. between
> open()
>> and close() when the generate.txt file is open for writing but not yet
>> closed and, thus, empty, so the VAR variable will be empty, too, and
>> the MATH() command will fail. That's also a typical race condition.
>>
>> This example shows that CMake-generated Makefiles might rate targets
> as
>> sufficiently independent to be built in parallel although these
> targets
>> are coupled by a custom command which - due to its construction - is
>> sensitive to parallel execution and gives rise to a race condition.
>> So, the warning w.r.t. building multiple targets in parallel must
>> be taken absolutely seriously.
>>
>> However, David, things can even get worse: Enhance the above-noted
>> PARALLEL project's CMakeLists.txt with the following three lines:
>>
>> FILE(WRITE ${CMAKE_BINARY_DIR}/main.c "int main(void){return 0;}\n")
>> ADD_EXECUTABLE(main main.c)
>> TARGET_LINK_LIBRARIES(main f g)
>>
>> Most certainly, this is a quite common configuration, and IMO, it's
>> perfectly legal to build with "make -j2 main", i.e. explicitly only
>> one target in parallel. When I enter the following lines
>>
>> while true; do
>> (make clean; make -j2 main) > /dev/null;
>> echo "generate.txt: $(cat generate.txt)";
>> done
>>
>> I see plenty of "generate.txt: 2" messages, i.e. the custom command is
>> still executed twice and usually in sequence, but sometimes there's a
>> "generate.txt: 1" message. Obviously, the race condition hasn't gone.
>>
>> What does this mean in regard to parallel building a single target?
>> Are independent targets the only ones that can be reliably built in
>> parallel, i.e. do I have to build f,g and main individually in the
>> correct order by hand? That would be strange, IMO. Maybe, someone
>> can shed light upon this issue.
>>
> 
> Hi Michael,
> 
> Nice example. Do you know, by any chance, if this only happens with
> custom targets/commands. I get the feeling that race condition only seem
> to happen with these user-defined thingies.

That's my assumption, too. Via their COMMANDs, custom targets/commands
can be interrelated to other targets, and neither CMake nor Make does
recognize such interrelations. Thus, targets might be considered, say,
parallelizable by CMake and Make but actually aren't due to an inter-
relation other than a usual dependency. In fact, it's very easy to
construct an example:

CMAKE_MINIMUM_REQUIRED(VERSION 2.8 FATAL_ERROR)
PROJECT(PARALLEL2 NONE)
SET(CMAKE_VERBOSE_MAKEFILE ON)
ADD_CUSTOM_TARGET(x ALL
    COMMAND ${CMAKE_COMMAND} -E touch x.dat
    COMMAND ${CMAKE_COMMAND} -E remove y.dat)
ADD_CUSTOM_TARGET(y ALL
    COMMAND ${CMAKE_COMMAND} -E touch y.dat
    COMMAND ${CMAKE_COMMAND} -E remove x.dat)

Targets x and y are strongly related because they immediately affect
each other, but there's no typical dependency between them. So, they
are considered parallelizable despite the obviously consisting race
condition: When building with "make -j1 x y" or "make -j1 y x", the
result is predictable and reproducible as expected, of course, but
with -j2 or higher, sometimes x.dat survives, sometimes y.dat, and
sometimes none of them. An easy solution is the introduction of an
explicit dependency, e.g. ADD_DEPENDENCIES(y x). This disqualifies
both targets from being built in parallel, and consequently, the
race condition vanishes.

> Maybe this should be put in the issue tracker? 

At least, some aspects should be clarified. My analysis of the
goings-on in the PARALLEL project from my previous posting is:

(1) Custom commands may be sensitive to parallel execution, but this
is trivial, and [C]Make has no idea what happens in a custom command.
Of course, the generate.cmake script has been particularly designed
to be non-reentrant in order to point out the possible implications.
(2) Two or more targets may be considered parallelizable by [C]Make
although they collectively use a custom command and, thus, are inter-
related in this manner. When building these targets in parallel, the
custom command runs more than once, typically once per target if the
number of Make processes is large enough. This fact alone is already
a significant difference between building with j1 and j2 or higher.
(3) If (1) and (2) coincide, the affected targets can not be built
reliably in parallel. This happens with PARALLEL's f and g targets.
(4) If another target depends on two or more targets as in (3), this
single target can also not be built reliably in parallel. That's the
example of PARALLEL's additional main target.

An easy solution for (2) and (3) is - similar to the approach denoted
above for the PARALLEL2 project - the introduction of a custom target
which depends on the custom command's OUTPUT and is a prerequisite of
the targets in question. Thereafter, the latters will not be built in
parallel anymore, and again, the race condition is cut the ground.
However, that's something one must know and take into account.

> I haven't had any issues with race condition when building just one
> target in parallel, but I *do* have occasional broken builds when
> building multiple targets in parallel. In that case, one of the targets
> is a custom target. I therefore recommended my colleague to build one
> target at a time when doing parallel builds.

As the PARALLEL project's main target shows, building only one target
at a time does not ensure that this can be done reliably in parallel.
Rather, the CMakeLists.txt files must meet some requirements in this
regard, especially w.r.t. custom targets/commands. The essence, IMO:

Custom targets/commands should be examined with regard to possible,
more or less subtle, interrelations to other targets that cannot be
recognized by CMake and Make. Typical cases are side-effects of the
COMMAND or custom commands whose OUTPUT is referred to by more than
one target. If such an interrelation is noticed, one should consider
to add a further custom target with appropriate dependencies, so the
affected targets are not supposed to be parallelizable by CMake/Make.

Recently, there has been a related discussion [1] about quite the same
issue, i.e. a custom command which is referred to by multiple targets
and probably performs a non-reentrant action. Because that discussion
didn't conclude with a definitive result and this topic is somewhat
non-trivial, I'd be very interested in a comment by David Cole or
colleagues, in particular concerning multiply referenced custom
commands in parallel builds.

Regards,

Michael

[1] http://www.mail-archive.com/cmake@cmake.org/msg32782.html


More information about the CMake mailing list