[CMake] file( DOWNLOAD ) problem

Marcus D. Hanwell marcus.hanwell at kitware.com
Fri Sep 28 17:09:29 EDT 2012


On Fri, Sep 28, 2012 at 4:42 PM, Robert Dailey <rcdailey.lists at gmail.com> wrote:
> On Fri, Sep 28, 2012 at 2:58 PM, David Cole <david.cole at kitware.com> wrote:
>> On Fri, Sep 28, 2012 at 3:30 PM, Robert Dailey <rcdailey.lists at gmail.com>
>> wrote:
>>>
>>> CMake downloads our third party libraries from a central repository
>>> and we have a "manifest.cmake" module where we define the following:
>>>
>>> - Library alias (the library's base name, such as "boost", "bdb",
>>> "openssl")
>>> - Library version (e.g. 2.1.5)
>>> - Library iteration (A counter that is incremented if a library
>>> changes remotely without version # increasing (such as if we rebuild
>>> the same version of the library and it must be re-served))
>>>
>>> My third party download logic knows to download the following files:
>>>
>>> <repo>/<alias>/<version>/include.7z
>>> <repo>/<alias>/<version>/<platform>.7z
>>>
>>> In this case, platform will represent the toolchain -- such as
>>> vc9sp1.7z for the lib & bin files for visual studio 2008 SP1.
>>>
>>> I have 2 files here, so I'd need 2 MD5 values recorded in my manifest
>>> somewhere, but since I have 1 line per "library" (not per file that
>>> will be downloaded) it wouldn't work out very well.
>>>
>>> I want to keep my manifest simple and easy to look at and modify,
>>> adding a bunch of MD5 values will make it messy and harder to upgrade
>>> libraries (right now I just drop files on a server and add or modify a
>>> line in the manifest. Having MD5s would mean I would have to run
>>> another tool to calculate the MD5 and then stuff it somewhere in the
>>> manifest module)
>>>
>>> If you have some ideas on how to make this fit well into my system I'm
>>> all for that, but I guess if not then I'll have to rely on assumptions
>>> :(
>>>
>>> However I strongly believe that CMake's file DOWNLOAD should do more
>>> checks to make sure that the data received is valid. I will look at
>>> the code later to see if there is more that can be done.
>>>
>>> On Wed, Sep 26, 2012 at 11:20 PM, David Cole <david.cole at kitware.com>
>>> wrote:
>>> > On Wed, Sep 26, 2012 at 7:32 PM, Robert Dailey
>>> > <rcdailey.lists at gmail.com> wrote:
>>> >> To do MD5 checks, I need to somehow record the expected MD5 somewhere,
>>> >> which isn't very maintainable.
>>> >>
>>> >> I provide a list of third party libraries that CMake should download
>>> >> from a central third party repository here at work. It is a trusted
>>> >> source, because we know it is, so we don't need to verify the MD5.
>>> >> However, if I could request the MD5 first, and then download, then
>>> >> compare the MD5 the server gave me with what I actually downloaded,
>>> >> that would certainly work just to verify the complete file was
>>> >> downloaded.
>>> >>
>>> >> Other than that, I'll have to rely on the status of the operation...
>>> >> but I don't like that the destination file is created prior to any
>>> >> writes being possible by CMake (it can't write anything if no data was
>>> >> received, so why doesn't it create the file once it has a write
>>> >> buffer?)
>>> >>
>>> >
>>> > Recording the MD5 somewhere is the only way to have a reasonable
>>> > re-assurance that what you've asked for is what you're getting from a
>>> > network operation. It seems to me that it could be made "maintainable"
>>> > if you centralize the knowledge of the checksums in a file that is
>>> > changed whenever any of the downloadable files is changed.
>>> >
>>> > I guess we figure it's no use downloading bits over the network if you
>>> > can't even open a (presumably local) output file for writing... so we
>>> > try to open the output file for writing first, and if it succeeds,
>>> > then we start grabbing bits from the network and writing them into the
>>> > file as we receive them.
>>> >
>>> > There is room for improvement in the file(DOWNLOAD implementation, but
>>> > it is the way it is right now (and will be for 2.8.10 as well...)
>>> >
>>> > Start proposing improvements for it now, and submitting patches to
>>> > make stuff better for 2.8.11 and/or beyond. :-)
>>> >
>>> >
>>> > HTH,
>>> > David
>>
>>
>>
>> You can rely on the STATUS to see if there were any errors during the
>> download. If the error code is 0, then you got whatever was on the server.
>> You can rely on that.
>>
>> So, if you don't want to use a hash, you can rely on STATUS. I do not know
>> of any case that reports a "0" status code, but gives an incorrect file
>> result.
>>
>>
>> What you *can't* rely on is that the correct thing was on the server. And to
>> validate that, you should use checksums of some sort. (If you can't or don't
>> want to, that's fine. To each his own.) Starting with CMake 2.8.10, there
>> will be EXPECTED_HASH and you can use the hashing algorithm of your choice
>> rather than just the MD5 that we had in 2.8.9 and earlier...
>>
>> Also new in 2.8.10, the Kitware provided pre-built binaries will link to
>> OpenSSL such that we can handle downloading files from "https://" URLs.
>
> In my tests, I've found that redirects can affect the return code of
> STATUS. For example, if I try to initiate a download of a file that
> doesn't really exist, the HTTP server may return a "dummy" file, in
> that case it would be downloaded just fine no matter what the URL or
> filename is, and status wouldn't know the difference.
>
> However for FTP URLs, it is generally more honest (since HTTP can do
> funny things, like lie to you).

That is why we have generally found the use of hashes to be more
reliable, even with FTP downloads can be interrupted/mangled from time
to time if you assume no malicious interference on a trusted network.
Most Linux distributions that download source or binary packages also
use hashes to verify the file was retrieved correctly.

It is great to know SSL will be supported in the next release - more
and more download sites are using SSL by default now.

Marcus


More information about the CMake mailing list