FetchContent: allow pre-fetching sources and generating source tarballs
I would like to propose a major FetchContent feature for evaluation.
Motivation
FetchContent is an awesome CMake feature, but I've been bitten by several of its drawbacks in the last few months:
- After all the dependencies have already been fetched and built, I'm completely unable to build the project offline (say, on a train) after merely rebasing/squashing some commits.
- If one of my commits touches some trivial part of
CMakeLists.txt
,cmake
will detect a change, force reconfigure and it will insist in re-fetching all the dependencies, and finally fail due to lack of internet connection, so I won't even be able to runcmake --build .
, even if all the sources are still exactly the same as before.
- If one of my commits touches some trivial part of
- In the package manager where I'm active (MacPorts) the usual workflow is to: (1) fetch sources (possibly from project mirrors), (2) verify checksums, (3) configure, (4) build, (5) destdir installation + archive, (5) final installation/activation. All the steps except for fetching the sources can/should be performed offline. The fact that the configure step needs internet access, that the sources cannot be mirrored, that they cannot be checksummed internally (yes, I know that FetchContent can also do checksumming, but that requires modifying upstream sources) is a very strong anti-pattern causing quite some issues.
- Dependencies (upstream projects, or even our own dependencies) happen to rewrite history, change tags, move locations (say, move from GitHub to GitLab), disappear, ... What builds just fine today may no longer build one month or a year from now when I need to build it again, so our own git history becomes semi-useless and it may take a non-trivial effort to figure out how to get an old commit to build.
- While the above problem is not so serious with one level of dependencies and it might suffice to simply change a tag somewhere in the repository, it becomes a nightmare when a dependency of dependency of dependency of dependency refuses to fetch.
- We were not aware that combining
GIT_SHALLOW ON
with a checksum (rather than tag) is a non-option. So now a few months of our repo history is broken (it cannot be built out of the box, without patching the sources of dependency of dependency, so that the final dependency would be fetched successfully withGIT_SHALLOW OFF
).
We used to use git submodules with its own set of issues before (most importantly, if a dependent project relocates or changes history, it is pretty much a hopeless situation when trying to build an old commit; and when dependencies are a moving target that need constant updates, resolving merge conflicts for git shasums of dependencies is a nightmare), but there's a very important difference compared to FetchContent:
- There's a single reliable command (
git submodule update --init --recursive
) that can pre-fetch all the sources and from that point on any builds may happen offline. - That means that any project using git submodules can easily create standalone source tarballs with zero external dependencies. And that's what most of the projects actually do when publishing releases on GitHub (without proper statistical analysis that's an "experimental claim" based on experience with packaging various software within MacPorts: fetching from auto-generated tarballs from a random commit is not sufficient to build the project, but fetching from releases works out of the box).
- Even when an upstream project doesn't provide "standalone" release binaries, MacPorts can easily generate a full source tarball from a git repository during the fetch phase and the CI can easily put it on our source mirror.
- Even ignoring the CI job that does source mirroring: when I'm packaging a new software I might end up building it 10 or 20 times before I succeed. If all the sources that I need are fetched during the first fetch phase (which is possible also with git submodules), those sources get stored and then extracted into a fresh build directory in the next build attempt. If, however, I need to fetch hundreds of megabytes during the configure phase, this needs to be done over and over again for every single build attempt.
Proposal
I would like to see a command similar to this:
cmake --fetch [options] path/to/source
that would be able to prefetch all the sources and eventually allow packing them into a standalone source tarball.
Additionally it would be extremely helpful if there was a way to avoid repeatedly checking for dependency updates if they have already been fetched before.
(That said, maybe I just need to go through some relatively convoluted sources: I'm not excluding the possibility that FetchContent or some other aspects of our build system are simply misconfigured, so that they keep re-fetching stuff all the time when even a tiny bit of CMakeLists.txt
gets touched.)