[cmake-developers] Experiments in CMake support for Clang (header & standard) modules

Mon May 7 12:01:47 EDT 2018

I think this discussion is more suited to the cmake-developers mailing
list. Moving there. Hopefully Brad or someone else can provide other
input from research already done.

On 05/07/2018 12:49 AM, David Blaikie wrote:
>
>>     The basic commands required are:
>>
>>       clang++ -fmodules -xc++ -Xclang -emit-module -Xclang
>>     -fmodules-codegen -fmodule-name=foo foo.modulemap -o foo.pcm
>>       clang++ -fmodules -c -fmodule-file=foo.pcm use.cpp
>>       clang++ -c foo.pcm
>>       clang++ foo.o use.o -o a.out
>
>     Ok. Fundamentally, I am suspicious of having to have a
>     -fmodule-file=foo.pcm for every 'import foo' in each cpp file. I
>     shouldn't have to manually add that each time I add a new import
>     to my cpp file. Even if it can be automated (eg by CMake), I
>     shouldn't have to have my buildsystem be regenerated each time I
>     add an import to my cpp file either.
>
>     That's something I mentioned in the google groups post I made
>     which you linked to. How will that work when using Qt or any other
>     library?
>
>
> - My understanding/feeling is that this would be similar to how a user
> has to change their link command when they pick up a new dependency.

Perhaps it would be interesting to get an idea of how often users need
to change their buildsystems because of a new link dependency, and how
often users add includes to existing c++ files.

I expect you'll find the latter to be a far bigger number.

I also expect that expecting users to edit their buildsystem, or allow
it to be regenerated every time they add/remove includes would lead to
less adoption of modules. I can see people trying them and then giving
up in frustration.

I think I read somewhere that the buildsystem in google already requires
included '.h' files to be listed explicitly in the buildsystem, so it's
no change in workflow there. For other teams, that would be a change
which could be a change in workflow and something rebelled against.

By the way, do you have any idea how much modules adoption would be
needed to constitute "success"? Is there a goal there?

> Nope, scratch that ^ I had thought that was the case, but talking more
> with Richard Smith it seems there's an expectation that modules will
> be somewhere between header and library granularity (obviously some
> small libraries today have one or only a few headers, some (like Qt)
> have many - maybe those on the Qt end might have slightly fewer
> modules than the have headers - but still several modules to one
> library most likely, by the sounds of it)

Why? Richard maybe you can answer that? These are the kinds of things I
was trying to get answers to in the previous post to iso sg2 in the
google group. I didn't get an answer as definitive as this, so maybe you
can share the reason behind such a definitive answer?

> Now, admittedly, external dependencies are a little more complicated
> than internal (within a single project consisting of multiple
> libraries) - which is why I'd like to focus a bit on the simpler
> internal case first.

Fair enough.

>  
>
>     Today, a beginner can find a random C++ book, type in a code
>     example from chapter one and put `g++ -I/opt/book_examples
>     prog1.cpp` into a terminal and get something compiling and
>     running. With modules, they'll potentially have to pass a whole
>     list of module files too.
>
>
> Yeah, there's some talk of supporting a mode that doesn't explicitly
> build/use modules in the filesystem, but only in memory for the
> purpose of preserving the isolation semantics of modules. This would
> be used in simple direct-compilation cases like this. Such a library
> might need a configuration file or similar the compiler can parse to
> discover the parameters (warning flags, define flags, whatever else)
> needed to build the BMI.

Perhaps. I'd be interested in how far into the book such a system would
take a beginner. Maybe that's fine, I don't know. Such a system might
not help with code in stack overflow questions/answers though, which
would probably be simpler sticking with includes (eg for Qt/boost).

Library authors will presumably have some say, or try to introduce some
'best practice' for users to follow. And such best practice will be
different for each library.

>  
>
>     I raised some of these issues a few years ago regarding the clang
>     implementation with files named exactly module.modulemap:
>
>     http://clang-developers.42468.n3.nabble.com/How-do-I-try-out-C-modules-with-clang-td4041946.html
>
>     http://clang-developers.42468.n3.nabble.com/How-do-I-try-out-C-modules-with-clang-td4041946i20.html
>
>     Interestingly, GCC is taking a directory-centric approach in the
>     driver (-fmodule-path=<dir>) as opposed to the 'add a file to your
>     compile line for each import' that Clang and MSVC are taking:
>
>      http://gcc.gnu.org/wiki/cxx-modules
>
>     Why is Clang not doing a directory-centric driver-interface? It
>     seems to obviously solve problems. I wonder if modules can be a
>     success without coordination between major compiler and
>     buildsystem developers. That's why I made the git repo - to help
>     work on something more concrete to see how things scale.
>
>
> 'We' (myself & other Clang developers) are/will be talking to GCC
> folks to try to get consistency here, in one direction or another
> (maybe some 3rd direction different from Clang or LLVM's). As you
> noted in a follow-up, there is a directory-based flag in Clang now,
> added by Boris as he's been working through adding modules support to
> Build2.

I just looked through the commits from Boris, and it seems he made some
changes relating to -fmodule-file=. That still presupposes that all
(transitively) used module files are specified on the command line.

I was talking about the -fprebuilt-module-path option added by Manman
Ren in https://reviews.llvm.org/D23125 because that actually relieves
the user/buildsystem of maintaining a list of all used modules (I hope).

>     Having just read all of my old posts again, I still worry things
>     like this will hinder modules 'too much' to be successful. The
>     more (small) barriers exist, the less chance of success. If
>     modules aren't successful, then they'll become a poisoned chalice
>     and no one will be able to work on fixing them. That's actually
>     exactly what I expect to happen, but I also still hope I'm just
>     missing something :). I really want to see a committee document
>     from the people working on modules which actually explores the
>     problems and barriers to adoption and concludes with 'none of
>     those things matter'. I think it's fixable, but I haven't seen
>     anyone interested enough to fix the problems (or even to find out
>     what they are).
>
>
> Indeed - hence my desire to talk through these things, get some
> practical experience, document them to the committee in perhaps a
> less-ranty, more concrete form along with pros/cons/unknowns/etc to
> hopefully find some consistency, maybe write up a document of "this is
> how we expect build systems to integrate with this C++ feature", etc.

Great. Nathan Sidwell already wrote a paper which is clearer than I am
on some of the problems:

 http://open-std.org/JTC1/SC22/WG21/docs/papers/2017/p0778r0.pdf

However he told me it 'wasn't popular'. I don't know if he means the
problems were dismissed, or his proposed solution was dismissed as not
popular.

Nevertheless, I recommend reading the problems stated there.

>
>>     My current very simplistic prototype, to build a module file, its
>>     respective module object file, and include those in the
>>     library/link for anything that depends on this library:
>>
>>       add_custom_command(
>>               COMMAND ${CMAKE_CXX_COMPILER} ${CMAKE_CXX_FLAGS} -xc++
>>     -c -Xclang -emit-module -fmodules -fmodule-name=Hello
>>     ${CMAKE_CURRENT_SOURCE_DIR}/module.modulemap -o
>>     ${CMAKE_CURRENT_BINARY_DIR}/hello_module.pcm -Xclang
>>     -fmodules-codegen
>>               DEPENDS module.modulemap hello.h
>
>     Why does this command depend on hello.h?
>
>
> Because it builds the binary module interface (hello_module.pcm) that
> is a serialized form of the compiler's internal representation of the
> contents of module.modulemap which refers to hello.h (the modulemap
> lists the header files that are part of the module). This is all using
> Clang's current backwards semi-compatible "header modules" stuff. In a
> "real" modules system, ideally there wouldn't be any modulemap. Just a
> .cppm file, and any files it depends on (discovered through the build
> system scanning the module imports, or a compiler-driven .d file style
> thing).
>
> Perhaps it'd be better for me to demonstrate something closer to the
> actual modules reality, rather than this retro header modules stuff
> that clang supports.

That would be better for me. I'm interested in modules-ts, but I'm not
interested in clang-modules.

>  
>
>     If that is changed and module.modulemap is not, what will happen?
>
>
> If hello.h is changed and module.modulemap is not changed? The
> hello_module.pcm does need to be rebuilt.

Hmm, this assumes that the pcm/BMI only contains declarations and not
definitions, right? I think clang outputs the definitions in a separate
object file, but GCC currently doesn't. Perhaps that's a difference that
cmake has to account for or pass on to the user.

>
> Ideally all of this would be implicit (maybe with some
> flag/configuration, or detected based on new file extensions for C++
> interface definitions) in the add_library - taking, let's imagine, the
> .ccm (let's say, for argument's sake*) file listed in the
> add_library's inputs and using it to build a .pcm (BMI), building that
> .pcm as an object file along with all the normal .cc files,

Ok, I think this is the separation I described above.

> * alternatively, maybe they'll all just be .cc files & a build system
> would be scanning the .cc files to figure out dependencies & could
> notice that one of them is the blessed module interface definition
> based on the first line in the file.

Today, users have to contend with errors resulting from their own code
being incorrect, using some 3rd party template incorrectly, linking not
working due to incorrect link dependencies, and incorrect compiles due
to missing include directories (or incorrect defines specified). I can
see incorrect inputs to module generation being a new category of errors
to confuse users.

For example, if in your list of files there are two files which look
like the blessed module interface based on the first line in the file,
there will be something to debug.

> So I suppose the more advanced question: Is there a way I can extend
> handling of existing CXX files (and/or define a new kind of file, say,
> CXXM?) specified in a cc_library. If I want to potentially check if a
> .cc file is a module, discover its module dependencies, add new rules
> about how to build those, etc. Is that do-able within my cmake
> project, or would that require changes to cmake itself? (I'm happy to
> poke around at what those changes might look like)

One of the things users can do in order to ensure that CMake works best
is to explicitly list the cpp files they want compiled, instead of
relying on globbing as users are prone to want to do:

 https://stackoverflow.com/questions/1027247/is-it-better-to-specify-source-files-with-glob-or-each-file-individually-in-cmak

if using globbing, adding a new file does not cause the buildsystem to
be regenerated, and you won't have a working build until you explicitly
cause cmake to be run again.

I expect you could get into similar problems with modules - needing a
module to be regenerated because its dependencies change (because it
exports what it imports from a dependency for example). I'm not sure
anything can be done to cause cmake to reliably regenerate the module in
that case. It seems similar to the globbing case to me.

But aside from that you could probably experimentally come up with a way
to do the check for whether a file is a module and discover its direct
dependencies using file(READ). You might want to delegate to a script in
another language to determine transitive dependencies and what
add_custom{_command,_target} code to generate.

>
>>     But this isn't ideal - I don't /think/ I've got the dependencies
>>     quite right & things might not be rebuilding at the right times.
>>     Also it involves hardcoding a bunch of things like the pcm file
>>     names, header files, etc.
>
>     Indeed. I think part of that comes from the way modules have been
>     designed. The TS has similar issues.
>
>
> Sure - but I'd still be curious to understand how I might go about
> modifying the build system to handle this. If there are obvious things
> I have gotten wrong about the dependencies, etc, that would cause this
> not to rebuild on modifications to any of the source/header files -
> I'd love any tips you've got.

Sure. I didn't notice anything from reading, but I also didn't try it
out. You might need to provide a repo with the module.modulemap/c++
files etc that are part of your experiment. Or better, provide something
based on modules-ts that I can try out.

> & if there are good paths forward for ways to prototype changes to the
> build system to handle, say, specifying a switch/setting a
> property/turning on a feature that I could implement that would
> collect all the .ccm files in an add_library rule and use them to make
> a .pcm file - I'd be happy to try prototyping that.

cmGeneratorTarget has a set of methods like GetResxSources which return
a subset of the files provided to add_library/target_sources by
splitting them by 'kind'. You would probably extend ComputeKindedSources
to handle the ccm extension, add a GetCCMFiles() to cmGeneratorTarget,
then use that new GetCCMFiles() in the makefiles/ninja generator to
generate rules.

When extending ComputeKindedSources could use

 if(Target->getPropertyAsBool("MAKE_CCM_RULES"))

as a condition to populating the 'kind'. Then rules will only be created
for targets which use something like

 set_property(TARGET myTarget PROPERTY MAKE_CCM_RULES ON)

in cmake code.

I'm guessing that's enough for you to implement what you want as an
experiment?

>>     Ideally, at least for a simplistic build, I wouldn't mind
>>     generating a modulemap from all the .h files (& have those
>>     headers listed in the add_library command - perhaps splitting
>>     public and private headers in some way, only including the public
>>     headers in the module file, likely). Eventually for the
>>     standards-proposal version, it's expected that there won't be any
>>     modulemap file, but maybe all headers are included in the module
>>     compilation (just pass the headers directly to the compiler).
>
>     In a design based on passing directories instead of files, would
>     those directories be redundant with the include directories?
>
>
> I'm not sure I understand the question, but if I do, I think the
> answer would be: no, they wouldn't be redundant. The system will not
> have precompiled modules available to use - because binary module
> definitions are compiler (& compiler version, and to some degree,
> compiler flags (eg: are you building this for x86 32 bit or 64 bit?))
> dependent.

Right. I discussed modules with Nathan Sidwell meanwhile and realised
this too.

>  
>
>     One of the problems modules adoption will hit is that all the
>     compilers are designing fundamentally different command line
>     interfaces for them.
>
>
> *nod* We'll be working amongst GCC and Clang at least to try to
> converge on something common.

Different flags would not be a problem for cmake at least, but if Clang
didn't have something like -fprebuilt-module-path and GCC did, that
would be the kind of 'fundamental' difference I mean.

>>     This also doesn't start to approach the issue of how to build
>>     modules for external libraries - which I'm happy to
>>     discuss/prototype too, though interested in working to streamline
>>     the inter-library but intra-project (not inter-project) case first.
>
>     Yes, there are many aspects to consider.
>
>     Are you interested in design of a CMake abstraction for this
>     stuff? I have thoughts on that, but I don't know if your level of
>     interest stretches that far.
>
>
> Not sure how much work it'd be - at the moment my immediate interest
> is to show as much real-world/can-actually-run prototype with cmake as
> possible, either with or without changes to cmake itself (or a
> combination of minimal cmake changes plus project-specific recipes of
> how to write a user's cmake files to work with this stuff) or also
> showing non-working/hypothetical prototypes of what ideal user cmake
> files would look like with reasonable/viable (but not yet implemented)
> cmake support.

Yes, it's specifying the ideal user cmake files that I mean. Given that
granularity of modules can be anywhere on the spectrum between
one-module-file-per-library and one-module-file-per-class, I think cmake
will need to consider one-module-file-per-library and
*not*-one-module-file-per-library separately.

In the *not*-one-module-file-per-library case, cmake might have to
delegate more to the user, so it would be more inconvenient for them.

In the one-module-file-per-library case, I think the ideal is something
like:

 add_library(foo foo.cpp)
 # assuming foo.h is a module interface file, this creates
 # a c++-module called foo and makes it an interface usage
 # requirement of the foo target defined above
 add_cxx_module(foo foo.h)

 # bar.cpp imports foo.
 add_library(bar bar.cpp)
 # bar links to foo, and a suitable compile line argument is added if
 # needed for the foo module.
 target_link_libraries(bar foo)

This would work best if foo.h did not contain

 module;
 export module foo;

(after http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0713r1.html)

but instead contained only

 module;

and the module name came from the buildsystem (or from the compiler
using the basename).

As it is, the above cmake code would have to determine the module name
from foo.h and throw an error if it was different from foo. Having the
module name inside the source just adds scope for things to be wrong. It
would be better to specify the module name on the outside.

I wonder what you think about that, and whether it can be changed in the
modules ts? My thoughts on this and what the ideal in cmake would be are
changing as the discussion continues.

>     Can you help? It would really help my understanding of where
>     things currently stand with modules.
>
>
> I can certainly have a go, for sure.

Great, thanks.

>  
>
>     For example, is there only one way to port the contents of the cpp
>     files?
>
>
> Much like header grouping - how granular headers are (how many headers
> you have for a given library) is up to the developer to some degree
> (certain things can't be split up), similarly with modules - given a
> set of C++ definitions, it's not 100% constrained how those
> definitions are exposed as modules - the developer has some freedom
> over how the declarations of those entities are grouped into modules.

Yes, exactly. This repo is small, but has a few libraries, so if we
start with one approach we should be easily able to also try a different
approach and examine what the difference is and what it means.

>  
>
>     After that, is there one importable module per class or one per
>     shared library (which I think would make more sense for Qt)?
>
>
> Apparently (this was a surprise to me - since I'd been thinking about
> this based on the Clang header modules (backwards compatibility stuff,
> not the standardized/new language feature modules)) the thinking is
> probably somewhere between one-per-class and one-per-shared-library.
> But for me, in terms of how a build file would interact with this,
> more than one-per-shared-library is probably the critical tipping point.

Yes. I think you're talking about the one-module-file-per-library and
*not*-one-module-file-per-library distinction I mentioned above.

> If it was just one per shared library, then I'd feel like the
> dependency/flag management would be relatively simple. You have to add
> a flag to the linker commandline to link in a library, so you have to
> add a flag to the compile step to reference a module, great. But, no,
> bit more complicated than that given the finer granularity that's
> expected here.

"finer granularity that's *allowed* here" really. If there is a simple
thing for the user to do (ie one-module-file-per-library), then cmake
can make that simple to achieve (because the dependencies between
modules are the same as dependencies between targets, which the user
already specifies with target_link_libraries).

If the user wants to do the more complicated thing
(*not*-one-module-file-per-library), then cmake can provide APIs for the
user to do that (perhaps by requiring the user to explicitly specify the
dependencies between modules).

My point is that cmake can design optimize for the easy way and I think
users will choose the easy way most of the time.

>
>     The git repo is an attempt to make the discussion concrete because
>     it would show how multiple classes and multiple libraries with
>     dependencies could interact in a modules world. I'm interested in
>     what it would look like ported to modules-ts, because as far as I
>     know, clang-modules and module maps would not need porting of the
>     cpp files at all.
>
>
> Right, clang header-modules is a backwards compatibility feature. It
> does require a constrained subset of C++ to be used to be effective
> (ie: basically your headers need to be what we think of as
> ideal/canonical headers - reincludable, independent, complete, etc).
> So if you've got good/isolated headers, you can port them to Clang's
> header modules by adding the module maps & potentially not doing
> anything else - though, if you rely on not changing your build system,
> then that presents some problems if you want to scale (more cores) or
> distribute your build. Because the build system doesn't know about
> these  dependencies - so if you have, say, two .cc files that both
> include foo.h then bar.h - well, the build system runs two compiles,
> both compiles try to implicitly build the foo.h module - one blocks
> waiting for the other to complete, then they continue and block again
> waiting for bar.h module to be built. If the build system knew about
> these dependencies (what Google uses - what we call "explicit
> (header)modules") then it could build the foo.h module and the bar.h
> module in parallel, then build the two .cc files in parallel.

I think that the 'build a module' step should be a precondition to the
compile step. I think the compiler should issue an error if it
encounters an import for a module it doesn't find a file for. No one
expects a linker to compile foo.cpp into foo.o and link it just because
it encounters a fooFunc without a definition which was declared in foo.h.

That would reduce the magic and expect something like

 add_cxx_module(somename somefile.h otherfiles.h)

to specify a module file and its constituent partitions, which I think
is fine.

>>     Basically: What do folks think about supporting these sort of
>>     features in CMake C++ Builds? Any pointers on how I might best
>>     implement this with or without changes to CMake?
>
>     I think some design is needed up front. I expect CMake would want
>     to have a first-class (on equal footing with include directories
>     or compile definitions and with particular handling) concept for
>     modules, extending the install(TARGET) command to install module
>     binary files etc.
>
>
> Module binary files wouldn't be installed in the sense of being part
> of the shipped package of a library - because module binary files are
> compiler/flag/etc specific.

Ok.

Thanks,

Stephen.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://cmake.org/pipermail/cmake-developers/attachments/20180507/3a4bc5da/attachment-0001.html>