[cmake-developers] Generating buildsystem metadata from CMake

Sat Mar 21 04:41:22 EDT 2015

Anton Makeev wrote:

> The other thing that seems troubling to me is that since file, target,
> language compiler options are split into different parts of metadata, the
> IDE need to know exactly how to assemble them back into the compiler’s
> command line (e.g. what flags go first file’s or language’s), duplicating
> CMake's logic that may be different from version to version and from
> compiler to compiler. The exact command line is needed to get the actual
> and precise defines, include search paths etc. from the compiler.

Yes. I previously proposed the <lang>_compile_command to contain information 
about how to built it:

 http://www.steveire.com/cmake-future/manual/cmake-metadata-generation.7.html#optional-properties

However, I think it might be better to generate something similar to what is 
currently generated in compile-commands.json into cmake-metadata.json. That 
is, we would generate (in some context)

 "include_directories" : ["/foo", "/opt"]
 "compile_definitions" : ["DEF=\"Foo\"", "OTHER_DEF=1"]
 "compile_command": "-c -DDEF=\"Foo\" -DOTHER_DEF=1 -I/foo -I/opt"

So, "compile_command" contains approximately what you can currently get from 
compile-commands.json.

The other properties contain things which are specifically known to be 
include directories or compile definitions, as javascript arrays. These 
properties are obviously redundant information, so I wonder if they should 
be generated at all? Is the compile command I wrote above easy to parse? Or 
is it sufficiently difficult to parse that this redundant information should 
be provided?

I have no idea if such "compile_command" can be generated for VS or Xcode, 
or if constructing such compile commands is done internally by those tools. 
So, this may not be a portable solution anyway.

> This would be really helpful indeed, currently, we have to introspect
> CMakeLists.txt files in order to find the most probably place where new
> files should be placed (works only in basic cases now). And being able to
> do so correctly is also crucial for refactoring (e.g. extract class).

Given the backtrace, you can navigate up the scope from the most recent 
frame to get out of any functions, macros or loops. You can then add a 
target_sources() line directly after that. 

That algorithm will work for every case (not just basic cases) as far as I 
can tell and is available with CMake 3.1.

>> * I didn't document the location or directory.  I'm not clear on whether
>>  it is supposed to be the build location, or the install location(s!),
>>  or all of those.
> 
> It would be useful, though, to have a location of generated files for each
> target: in case metadata misses some information (and I think it won’t
> cover every possible need anytime soon), IDE will be able to get if from
> generated makefiles.

Yes, we can at least provide the build location in an obvious way. We can 
discuss install locations eventually.

>> * I don't generate 'dependencies' (actually the list of files which the
>>  buildsystem re-generation depends on) as Aleix did, because there is no
>>  well-defined usefulness for that list yet.
> 
> As Tobias pointed, we at least need to know what files are the part of
> CMake project, that is, the list of all CMakeLists.txt and *.cmake files,
> used for generation (ideally, including missing ones, since in that case
> IDE could be able to tell when missing file is created and refresh the
> project)

As I wrote to Tobias, I'm apprehensive about this, and it would require 
other work to make cmake parallel safe first. 

I think if the IDE does not have focus it should not be running 'cmake .' on 
my behalf. I think if the IDE newly gets focus you can maybe run 'cmake .' 
at *that* point (after the user is done with their rebase or whatever). That 
doesn't require giving you a list of files to watch. Maybe I'm missing 
something though.

>> * Some more information from project() may be relevant, but it's not
>> clear
>>  yet. We will likely know more when we have decided the file format and
>>  generated some 'interesting' metadata files.
> 
> Project name, list of the configurations are most needed ones.
> We also use CMAKE_<lang>_SOURCEFILE_EXTENSIONS to determine if a given
> file is potentially source file or not.

As CMake already knows which files are 'object sources', the metadata will 
provide that. Also, the <lang> extensions is not enough. See the unit test I 
created and in particular the compiled_as_cxx.c file.

> This has already been discussed but I give our usage scenario:
> 
> in CLion we retrieve the list of all build types (aka configurations,
> Debug, Release etc) 

>From where do you currently retrieve this list? I guess you look at all 
cache keys named 

 CMAKE_.*_FLAGS_(.*)

and list the matches?

> and then generate project using Makefiles generator
> for each of them. This is necessary because of several reasons: 1) To be
> able to correctly build language model, we need to know, when a file is
> used in several configurations, which means, it's compiler settings and
> macros are different.
>     E.g. some branches of code may not be available in Debug or Release
>     and we give user an option to quickly switch between them in the
>     editor.

This seems similar to what Tobias talked about.

> I don’t know if it’s possible at all, but it would be great if we could
> have info for all configurations generated in one go (not only for
> multi-config, but for single-config generators as well like Ninja and
> Makefiles).

I can think of two ways to make that possible:

1) Create new mulit-config generators, or add options for the existing ones.
2) Add a generic multi-configuration mode to cmake:
 http://thread.gmane.org/gmane.comp.programming.tools.cmake.devel/10873/focus=10912
 http://public.kitware.com/Bug/view.php?id=14539

I consider both out of scope for this thread though.

> As a side note, it seems more natural to me to have one json file with one
> or several configurations listed, providing that there is also shared
> project info that should be in that files. something like that:

I'm going to post a separate mail about this. I think 'how to handle' the 
most pressing point we need to design for here. We can't make further 
progress until that part is designed.

>> * Generating metadata only (without generating buildsystem files) is not
>>  currently in scope.  This was requested several times, but it is not
>>  clear why.
> 
> It’s simply to be able to get this the information as quickly as possible.
> I’m not sure which part is most slow, but, say, InsightToolKit 4.5
> (http://www.itk.org/Wiki/ITK/Source <http://www.itk.org/Wiki/ITK/Source>),
> generates in couple of minutes. 

CMake has a 'configure' step followed by a 'generate step'. Your count of 
minutes must be the sum of both. The information determined during those 
steps is exactly what the metadata file should contain. Avoiding all of it 
would leave you with no metadata.

> The regeneration, even when nothing was
> changes, a few dozens of seconds.

This cost is mostly the 'generation step', as everything from the 'configure 
step' was cached and available for re-generation.

> Plus, we’d prefer being able to open the project without any questions to
> user, e.g. not asking, which generator he/she prefers. If we generate
> using ‘wrong’ default generator we’ll need to regenerate everything again
> when user decides to change it.

It seems like you can use a throwaway temp directory until the user chooses 
a generator. I am sympathetic to the idea of 'not breaking the users flow', 
but I don't currently have any idea how to avoid it.

Everything does indeed have to be re-generated if the generator is changed, 
and cmake currently issues an error if you use a different -G option than 
was originally used in a build dir. If you really want to change that 
behavior, I suggest a separate bug report to track the idea. As I said I am 
sympathetic to the idea, but I don't see a way. If you file a bug, maybe 
Brad will have an idea or can say it's fundamentally out of scope.

> Another benefit of skipping actual generation is possibly better error
> recoverability. That is, some generators may fail here and there if the
> project is incorrectly configured (e.g. source files are missing).
> Skipping the generation phase will (probably) help getting the project
> metadata even in that case.

I don't think that's the case. The 'generate step' is the point where the 
metadata is generated, and that step begins strictly after the 'configure 
step' ends. The scenario you describe is errors during the 'configure step'. 
That means no metadata for you.

If this is possible to change, it's out of scope of this current design 
work. I'd suggest a separate bug report.

> But anyway, it seems a little outside of the scope of the discussion.

Yep :).

>> * How much information does tooling need about installation?  Targets
>>  can use different include directories and compile definitions in their
>>  install locations compared to their build locations.  If IDEs want to
>>  provide some user interface related to the project files in their
>>  install location, perhaps a separate solution based on cmExportFile*
>>  is needed.  For future investigation.
> 
> 
> An additional though: here only the 'project information' aspect is
> discussed; though, to be fully machine-frienly, cmake should be able to
> also generate parseable output (error reports etc), provide the progress,
> etc. So, just to mull over, probably the discussed design should consider
> such future direction.

Ok. It is also orthogonal to the metadata of the build itself and can be 
designed separately.

I filed

 http://public.kitware.com/Bug/view.php?id=15463

if you want to engage in the design or implementation of that.

Thanks,

Steve.