[cmake-developers] [Review Request] New module: IncludeUrl

Thu Oct 8 14:28:33 EDT 2015

Hello Brad,

thanks for your comments.

On 08/10/2015 17:48, Brad King wrote:
> I wish you had come asking about this proposal to hold discussion
> before going through all the work to implement this.

We've been successfully using this module for about 2 years now and we
will continue to use it anyway, so I did not waste time... We just
thought that this module could be useful for other people, so most of
the work was just adding the unit tests, and it was worth doing it,
because I ended up fixing several bugs that came out. ;)

> I do not think this is a good idea as proposed, at least for
> deployment with upstream CMake.

I understand this is a quite controversial module, but I would like to
stress that this is something that can already be done using CMake just
by executing file(DOWNLOAD) and include(), this module just makes it
easy to do it. Whether this is a security issue or not is up to how this
module is used.

Also note that the ExternalProject module does something that is
conceptually very similar, i.e. downloads some code from somewhere on
the internet, and includes it in your project. Unfortunately
ExternalProject does not allow you to do it at configure time*, but only
at build time, that means you cannot, for example, use a module from ECM
in your build system unless it is installed.

* Actually I managed to do that, but it is a quite ugly solution that
requires ~100 lines of CMake code.

>> The main use case for such a module is for groups that have several
>> projects or CMake scripts, handled by different developers, that import
>> the same CMake module that is sometimes updated, and they want to keep
>> these files synchronized in all the projects. This is very hard to
>> achieve when the developers are many and don't care too much about the
>> build system. Instead of adding this file to each project, this module
>> allows to put it somewhere, and automatically download and include it
>> when required.
> 
> This means the module is not versioned with the includers and could
> break them with an update.

Of course this depends a lot on the content of the file. A simple file
containing just a few variables, will hardly break with an update. A
more complicated file containing macros or functions might break, but
this could also happen if you have your module in some external package,
and you update the package version.

> If EXPECTED_HASH is used then the includer must be modified whenever
> there is an update anyway.

Yes, indeed. This is useful if you depend on a specific version of a
file and don't want to use a new version even if the module is updated.
For example you could use a raw file at a specific git revision if you
want to be sure that the file you are using will never change.
Of course you could include it in your project, but this increases code
duplication* and makes a lot less clear what is happening. Where does
the file come from? Which version is it? Was it updated upstream and if
it was, where can I get the latest version of the file? Does it have
local changes?

* It is quite interesting to note that in order to reduce the code
duplication, we duplicated this module in several places, good reason
for wanting it upstream! :)

> If EXPECTED_HASH is not used then you're running code downloaded over a
> network with no chance to check it.

You can still use the TLS_VERIFY to verify the https server certificate
to ensure that the origin of the file is the one you are expecting.

Also the same thing could happen for a package downloaded using
ExternalProject, if someone added malicious code in the CMakeLists.txt
of one project, when you run "make", the project is downloaded and
configured, hence the malicious code is executed without giving you the
chance to check it.

> Instead the common files can come with some external package and found
> with find_package.  Then at least some versioning can be done.  See
> KDE's extra-cmake-modules for example.

That means adding an extra dependencies to the build system. The user
will have to install it _before_ installing your software, and to keep
it updated. Also having modules that depend on different versions of the
package becomes complicated.

In a research environment, the software development cycle is heavily
driven by project or paper deadlines. Academic research products are
scientific papers, not code. Researchers are not software engineers, and
they don't want to waste time on the build system or handling
dependencies issues.
Imagine you create a demo module doing something very cool that uses one
module from package X. You commit your code somewhere, write a paper,
and then forget about it for years. Time passes and at some point
someone else reads your paper and wants to try it, but meanwhile he has
several other packages that depend on a newer version of X and since the
code won't compile, he will not try your code and of course he will not
waste time on it and he will not cite your paper in his work.

Another possible application is for CDash build machines. In CMake you
have a script that describes the machine and the build flags, and then
just includes a cmake_common.cmake script that must be downloaded and
updated by the user (even though it's not updated very often).
If for some reason in your project you have to change this script often
(for example because you use CDash with subprojects, and you must update
the list of the modules whenever a project is added or removed), using
this module you could ensure that all the machines are always building
using the latest version of the script without manual intervention.

I hope I made myself clear about the reasons why I believe this is a
useful module and a good candidate for being upstream, but if you still
think that this is not a good idea, I will withdraw my proposal, no
offence taken :)

Cheers,
 Daniele