[cmake-developers] Experiments in CMake support for Clang (header & standard) modules

Thu Aug 30 19:28:24 EDT 2018

On Fri, Aug 24, 2018 at 2:35 AM Stephen Kelly <steveire at gmail.com> wrote:

>
> On 24/08/18 02:32, David Blaikie wrote:
>
> On Tue, Jul 24, 2018 at 3:20 PM Stephen Kelly <steveire at gmail.com> wrote:
>
>> David Blaikie wrote:
>>
>> > (just CC'ing you Richard in case you want to read my ramblings/spot any
>> > inaccuracies, etc)
>> >
>> > Excuse the delay - coming back to this a bit now. Though the varying
>> > opinions on what modules will take to integrate with build system still
>> > weighs on me a bit
>>
>> Can you explain what you mean by 'weighs on you'? Does that mean you see
>> it
>> as tricky now?
>
>
> Yes, to some extent. If the build system is going to require the
> compiler-callsback-to-buildsystem that it sounds like (from discussions
> with Richard & Nathan, etc) is reasonable - yeah, I'd say that's a bigger
> change to the way C++ is compiled than I was expecting/thinking of going
> into this.
>
>
> Yes.
>
>
>
>
>> I've kind of been assuming that people generally think it is not tricky,
>> and
>> I'm just wrong in thinking it is and I'll eventually see how it is all
>> manageable.
>>
>
> I think it's manageable - the thing that weighs on me, I suppose, is
> whether or not the community at large will "buy" it, as such.
>
>
> Yes, that has been my point since I first started talking about modules. I
> don't think modules will gain a critical mass of adoption as currently
> designed (and as currently designed to work with buildsystems).
>

For myself, I don't think I'd go that far (I think the current design might
be feasible) - and I'm mostly trying to set aside those concerns to get to
more concrete things - to work through the build system ramifications,
prototype things, etc, to get more concrete experience/demonstration of the
possibilities and problems.

> And part of that is on the work we're doing to figure out the integration
> with build systems, etc, so that there's at least the first few pieces of
> support that might help gain user adoption to justify/encourage/provide
> work on further support, etc...
>
> Yes, reading the document Nathan sent us on June 12th this year, it seems
> that CMake would have to implement a server mode so that the compiler will
> invoke it with RPC. That server will also need to consume some data
> generated by CMake during buildsystem generation (eg user specified flags)
> and put that together with information sent by the compiler (eg ) in order
> to formulate a response. It's complex. Maybe CMake and other buildsystem
> generators can do it, but there are many bespoke systems out there which
> would have to have some way to justify the cost of developing such a thing.
>

Yeah, certainly a possibility - maybe it'd be enough of a wedge to cause
people to collapse the large build system space into fewer options - many
other languages require this sort of level of coupling between
compiler/build system/language.

But getting that momentum started would be getting the main build systems
supporting it & starting at the leaves (independent projects starting to
write modular code for themselves - even if all their external dependencies
aren't) - then, with enough users using it downstream, might be some
libraries providing a modular option (or maybe even only provide it as
modules & that'd be the wedge I'm talking about - then teams/projects that
don't support modules would be left out a bit & provide an incentive for
them to move over to have modules support to use these dependencies).

> > The build.sh script shows the commands required to build it (though I
>> > haven't checked the exact fmodule-file dependencies to check that
>> they're
>> > all necessary, etc) - and with current Clang top-of-tree it does build
>> and
>> > run the example dinnerparty program.
>>
>> Ok. I tried with my several-weeks-old checkout and it failed on the first
>> command with -modules-ts in it (for AbstractFruit.cppm - the simplest
>> one).
>>
>> I'll update my build and try again, but that will take some time.
>>
>
> Huh - I mean it's certainly a moving target - I had to file/workaround a
> few bugs to get it working as much as it is, so not /too/ surprising. Did
> you get it working in the end? If not, could you specify the exact revision
> your compiler's at and show the complete output?
>
>
> Yes, I got it working. See
>
>  https://www.mail-archive.com/cmake-developers@cmake.org/msg18623.html
>
>
>
> > But I'm not sure how best to determine the order in which to build files
>> within a library - that's where the sort of -MM-esque stuff, etc, would be
>
>
>> > necessary.
>>
>> Would it? I thought the -MM stuff would mostly be necessary for
>> determining
>> when to rebuild? Don't we need to determine the build order before the
>> first
>> build of anything? The -MM stuff doesn't help that.
>>
>
> -MM produces output separate from the compilation (so far as I can tell -
> clang++ -MM x.cpp doesn't produce anything other than the makefile fragment
> on stdout) & finds all the headers, etc. So that's basically the same as
> what we'd need here
>
>
> Are you sure? I thought compiling with -MM gives us information that we
> need before we compile the first time. Sorry if that was not clear from
> what I wrote above. I see a chicken-egg problem. However, I assume I'm just
> misunderstanding you (you said that -MM would be used to determine build
> order for the initial build) so let's just drop this.
>

Yeah, still a bit confused - not sure ignoring this tangent is useful,
maybe there's something in the misunderstanding here.

clang -MM doesn't compile the source file, and prints out something like:

f1.o: f1.cpp foo.h

So we know that f1.cpp needs foo.h - now, in this case it would error if
foo.h didn't exist, because -MM has to look through all the inclusions to
provide all the transitive inclusions. The same isn't true of modules - if,
instead of foo.h this was the foo module (foo.cppm) and that module depends
on bar.cppm - then the output for f1.cpp would just be "f1.o: f1.cpp
foo.pcm" - without needing to compile foo.cppm, then the build system could
ask "what are the dependencies for foo.cppm (knowing it can generate
foo.pcm)" and get "foo.pcm.o: foo.cppm bar.pcm", etc. Very rough/hand-wavy,
but the general idea I think is reasonable.

(there are some bonus wrinkles for legacy imports, but there are some ways
to address that too)

> Looking at your example - if you have a library for all the fruits and
> libabstractfruit, libfruitsalad, libnotfruitsalad, and libbowls - then
> you'd have one module interface for each of those (AbstractFruit.cppm,
> FruitSalad.cppm, NotFruitSalad.cppm, Bowls.cppm) that would be imported (so
> replace "import Apple", "import Grape" with "import FruitSalad", etc... ) &
> the implementations could be in multiple files if desired (Apple.cpp,
> Grape.cpp, etc).
>
>
> Could you show me what that would look like for the repo? I am interested
> to know if this approach means concatenating the content of multiple files
> (eg Grape.h and Apple.h) and porting that result to a module. My instinct
> says that won't gain adoption.
>

Sure, let's see...

https://github.com/dwblaikie/ModulesExperiments/commit/4438a017c422c37106741253a78e2bd7ee99c43e

I mean it could be done in other ways - you could #include Grape.h and
Apple.h into Fruit.cppm, I suppose. Could allow you to keep the old headers
for non-modular users & just wrap them up in a module (same way I did for
the "std" module in this example).

> >> Ok. That's not much better though. It still means editing/generating the
>> >> buildsystem each time you add an import.
>> >
>> >
>> > Isn't that true today with headers, though?
>>
>> No. Imagine you implemented FruitBowl.cpp in revision 1 such that it did
>> not
>> #include Grape.h and it did not add the Grape to the bowl.
>>
>> Then you edit FruitBowl.cpp to #include Grape.h and add the Grape to the
>> bowl. Because Grape.h and Apple.h are in the same directory (which you
>> already have a -Ipath/to/headers for in your buildsystem), in this
>> (today)
>> scenario, you don't have to edit the buildsystem.
>>
>
> Well, you don't have to do it manually, but your build system ideally
> should reflect this new dependency so it knows to rebuild FruitBowl.cpp if
> Grape.h changes.
>
>
> I never said it had to be done manually in the real world. I mentioned
> that in the context of your script. The point I keep making is that the
> buildsystem has to be regenerated.
>

*nod* Though the same seems to be true today when #includes change - the
build system has to become aware of the new dependencies that have been
introduced (either discovered during compilation with -MD (generating the
.o file and the .d Makefile fragment at the same time) or separately/ahead
of time with -MM (generating the Makefile fragment on stdout and not
performing compilation)).

> I wonder if people will use C++ modules if CMake/their generator has to be
>> re-run (automatically or through explicit user action) every time they
>> add
>> 'import foo;' to their C++ code... What do you think?
>>
>
> If it's automatic & efficient (I hope it doesn't redo all the work of
> discovery for all files - just the ones that have changed) it seems
> plausible to me.
>
>
> At least in the CMake case, the logic is currently coarse - if the
> buildsystem needs to be regenerated, the entire configure and generate
> steps are invoked.
>

When I add a new include to a file currently, it doesn't look like cmake
re-runs. Any idea why that is? I guess ninja knows enough to update for the
new include dependency - perhaps it'd need to rerun cmake if my new
#include was of a generated file (even if there were already rules for
generating that file?)?

If modules worked similarly to the way things seem to work here - importing
an external module that hadn't been imported anywhere by your project
before might rerun cmake, but importing that module into a second source
file would be akin to #including a file that already had a rule for
generating it - so it wouldn't rerun cmake, I don't think.

> Maybe that can be changed, but that's just more effort required on the
> part of all buildsystem generators, including bespoke ones. I think the
> level of effort being pushed on buildsystems is not well appreciated by the
> modules proposal.
>

Perhaps - and that's what I'm trying to work through & write up.

> What I see as a worst-case scenario is:
>
> * Modules gets added to the standard to much applause
> * User realize that they have to rename all of their .h files to cppm and
> carefully change those files to use imports. There are new requirements
> regarding where imports can appear, and things don't work at first because
> of various reasons.
> * Maybe some users think that creating a module per library is a better
> idea, so they concat those new cppm files, sorting all the imports to the
> top.
> * Porting to Modules is hard anyway, because dependencies also need to be
> updated etc. Developers don't get benefits until everything is 'just right'.
> * Some popular buildsystems develop the features to satisfy the new
> requirements
> * Most buildsystems, which are bespoke, don't implement the GCC
> oracle-type stuff and just fudge things with parsing headers using a simple
> script which looks for imports. It kind of works, but is fragile.
> * Lots of time is spent on buildsystems being regenerated, because the
> bespoke systems don't get optimized in this new way.
> * After a trial run, most organizations that try modules reverse course
> and stop using them.
> * Modules deemed to have failed.
>
> Maybe I'm being too negative, but this seems to be the likely result to
> me. I think there are more problems lurking that we don't know about yet.
> But, I've said this before, and I still hope I'm wrong and just missing
> something.
>

I think that's certainly a possibility - but I'm approaching this
optimistically - rather than trying to prove it can't work, I'm trying to
figure out how it would work & if those solutions all turn out to be
unworkable/untenable by the community, then that's some useful data to feed
back into the committee process.

>
>
> Sorry for the rather long delay on this - hopefully it helps us converge a
> little.
>
> I'll try to find some time to get back to my original prototype & your
> replies to do with that to see if I can flesh out the simpler "one module
> per library (with some of the inefficiency of just assuming strong
> dependencies between libraries, rather than the fine grained stuff we could
> do with -MM-esque support), no external modules" scenario (& maybe the
> retro/"header modules" style, rather than/in addition to the new C++
> modules TS/atom style) - would be great to have a reasonable prototype of
> that as a place to work from, I think.
>
>
> Yes, sounds interesting.
>
> There are other things we would want to explore then too. In particular,
> in my repo, all of the examples are part of the same buildsystem. We should
> model external dependencies too - ie, pretend each library has a
> standalone/hermetic buildsystem. That would mean that AbstractFruit would
> generate its own pcm files to build itself, but each dependency would also
> have to generate the AbstractFruit pcm files in order to compile against it
> as an external library (because pcm files will never be part of an install
> step, or a linux package or anything - they are not distribution artifacts).
>

Yep, eventually I'll get to that - trying to focus on small, incremental
steps to start with. The first adoption of modules will likely be on leaf
projectns in part because of this complexity - and they can always wrap any
external dependencies in modules ala the 'std' module in my example.
Getting leaf projects adopting this functionality would be a good gateway
towards pressure on build systems to support that and eventually to support
modularized libraries.

- Dave

>
> Thanks,
>
> Stephen.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://cmake.org/pipermail/cmake-developers/attachments/20180830/44d19e79/attachment-0001.html>