API for dependency providers (find_package() and FetchContent)
This is the previously mentioned complement to #21687 (closed). Apologies for the length, there's a number of things that come together in this proposal, so I wanted to be clear on different aspects to address discussion points I've already had with various people and stakeholders around this. I don't expect the actual implementation to be all that complex.
Context
The find_package()
and FetchContent_MakeAvailable()
commands are two primary ways that projects bring dependencies into their build. The former generally expects the packages to be pre-built and provided externally, while the latter typically builds the dependencies as part of the project.
A significant proportion of CMake users would like to make use of package managers to provide these dependencies. Different package managers choose different strategies for how to provide this (some provide multiple methods), each with their own advantages and disadvantages. Very few support both find_package()
and FetchContent_MakeAvailable()
equally well, most tend to support only the former. Some require modifying the project to call specific commands or load certain files to set up the project for the package manager, others require the user to use a toolchain file that the package manager provides. It has been common for package managers to redefine find_package()
to do their own logic first before potentially forwarding through to the built-in implementation, even though this has never been officially supported and relies on undocumented behavior. Methods continue to evolve.
A project should not have to hard-code a choice of dependency provider. If they do, they lock themselves out of being used with other projects that might choose a different provider, or people with constraints on providers may be prevented from using an otherwise suitable project. Users should be able to switch between dependency providers, as long as each provider does in fact provide the dependencies they need. In addition, projects might need a mix of dependency providers to satisfy internal constraints. This would be more common in organisations where some dependencies might be internal and have tight access restrictions (e.g. IP protection, legacy systems, export controls, and so on). They may have some things that need custom, in-house logic to provide them, but others can be provided by commercial off-the-shelf applications. For these scenarios, the ability to get some packages via one mechanism or provider and other packages from another could be a very important need.
An often missed or under-appreciated point is that it is quite common for projects to provide controls for turning on or off different parts of the build. This could be turning on/off things like tests, building examples, enabling or disabling specific features, selecting between different implementations of a particular interface, and so on. These can all impact the set of dependencies that are needed. The project will request that a dependency be provided only after determining that it is needed. This avoids having to pull down potentially many and/or huge dependencies that ultimately wouldn't be needed. For CI systems that frequently build from scratch, this could be a critical pain point.
Proposed New Capabilities
Dependency Request Interception
CMake lacks a well-defined, supported way for dependency providers to intercept calls to find_package()
and FetchContent_MakeAvailable()
. Both commands could relatively easily support an approach of "try satisfying this request using some list of providers, and if none provide it, continue on with the usual logic to provide it ourselves". Projects would need no changes, dependency providers would register themselves (see next item) and CMake would handle forwarding requests internally.
Dedicated Provider Injection Point
Dependency providers need a way to register themselves with CMake, so that CMake knows where it can potentially forward on the dependency requests it receives from projects. This needs to happen early enough to support what some dependency providers may want to do, but not so early that some important things are not defined yet. Some providers want to control the toolchain details, potentially setting or overriding the CMAKE_TOOLCHAIN_FILE
, which means they need a point before the first project()
call starts doing things. Some may need to access the source or build directories, so things like CMAKE_SOURCE_DIR
and CMAKE_BINARY_DIR
need to be available. The value of existing cache variables should also be available.
The logical place for an injection point therefore seems to be the very first project()
call. We currently have CMAKE_PROJECT_INCLUDE_BEFORE
and CMAKE_PROJECT_<PROJECT-NAME>_INCLUDE_BEFORE
, but those are meant for users to implement arbitrary custom steps and we shouldn't take over those variables. The former also gets used for every project()
call, not just the first. The second would require users to know the name of the top level project, which would be an annoyance as they move between projects and want to apply the same setup steps to each one. Instead of trying to (ab)use one of those existing variables, I propose we add a dedicated injection point which is the first thing a project()
command does before pulling in any other file (a custom user include, toolchain, etc.).
Proposed API Changes
NOTE: See updated proposal in #22619 (comment 1184153) which has some changes from the original description here.
-
Add a new variable accepting a list of files to be included by the first
project()
command. As a working name for discussion, I'll proposeCMAKE_PROJECT_SETUP
(also see Observations section for context). We could also support an environment variable of the same name which is used if there is no CMake variable of that name defined (i.e. similar to howCMAKE_TOOLCHAIN_FILE
works). -
Add a new
cmake_language()
subcommand for registering event handlers (see #22466 for a dedicated issue around this). In this proposal, we would be defining event types for dependency providers. For discussion, I propose the syntaxcmake_language(EVENT <type> <command-name>)
. Allowed values for<type>
would beFIND_PACKAGE
andFETCHCONTENT_MAKEAVAILABLE
, with flexibility to add new types in the future if needed (examples of which are discussed in #22466). The<command-name>
would be called with context-specific arguments for the associated event type (see below). I propose that we do NOT provide a public API for deregistering an event handler, querying the set of registered handlers or other handler-related manipulation in the initial implementation, to contain scope. -
Add a new command to
FetchContent
for retrieving the "winning" declared dependency details for a given dependency (there is already an internal command which does most of what is required). This allows a provider to retrieve the declared details when given only a dependency name byFetchContent_MakeAvailable()
. Note that these details should omit anyOVERRIDE_FIND_PACKAGE
orFIND_PACKAGE_ARGS
(both added in !5688 (merged) for #21687 (closed)), since those relate to call re-routing that CMake itself manages (the providers will be given an opportunity to provide the dependency before that re-routing).
The command for a FIND_PACKAGE
provider would receive the exact same arguments as was passed to the find_package()
call. In order to communicate whether it found the requested dependency, we could require that it sets ${CMAKE_FIND_PACKAGE_NAME}_FOUND
to false if it does NOT provide it. This would be pretty close to how config packages already work and may ultimately be easy for both providers and CMake. Alternatively, we could make the first argument be the name of an output variable they must set to indicate whether they provided the dependency or not. In the absence of any other opinions, I'd probably go with the former approach rather than the latter.
The command for a FETCHCONTENT_MAKEAVAILABLE
provider would receive a name of an output variable followed by a list of dependency names to be satisfied. The command would need to set the output variable to the list of dependencies it did NOT provide. This has the advantage that for both provider types, the provider has to set something to say "I did not provide this", so it is more consistent. Note also that by giving the provider a list of dependencies rather than one at a time, the provider is free to work in parallel if it knows it is safe to do so. This has the potential to provide significantly faster dependency handling compared to find_package()
, which only accepts one dependency at a time. Note that we would only provide a list of dependencies if the project itself did so. The order of the dependencies in the list might be important, depending on how the provider works. Some of the dependencies may themselves depend on others in the list, so the provider must honour the ordering if it would make a difference to how dependencies get populated/built, etc.
Example
my_dep_providers.cmake:
function(my_find_package_provider)
# ${ARGN} is expected to be the full set of arguments given to the `find_package()` call
# ... Does its thing and only sets ${CMAKE_FIND_PACKAGE_NAME}_FOUND to false if not provided
endfunction()
function(my_FetchContent_provider out_var)
# ${ARGN} would be a list of dependency names
# ... Does its thing, then sets out_var in parent scope to the dependencies it did not provide
endfunction()
cmake_language(EVENT FIND_PACKAGE my_find_package_provider)
cmake_language(EVENT FETCHCONTENT_MAKEAVAILABLE my_FetchContent_provider)
User might invoke CMake like this:
cmake -DCMAKE_PROJECT_SETUP=/path/to/my_dep_providers.cmake ...
Observations
Nothing in this proposal changes existing behavior if no dependency providers are set. Users can continue to use whatever dependency managers they are currently without impact.
Package providers have both an opportunity to shape the configuration environment (at setup injection time) and to query the environment at the point where any dependency is requested. Thus, for those package providers that want to control the build, they can do so. For those providers who want to adapt what they provide to the environment as set up by the user, they can also do so. The above doesn't dictate the choice, the providers themselves are the ones that will place restrictions on the user, so the user is free to decide if they accept any such restrictions or find a different package provider that better suits their needs.
CMake's responsibility in this proposal would be to give the provider the "what", but not the "how". CMake also doesn't need to check what is provided, only whether the provider said it provided it or not. It is the provider's responsibility to provide a suitable dependency to the project. Note that for some providers, this might mean they ignore some or all of the information about the dependency (git repo, commit hash, find_package()
components, etc.). Some providers maintain a coherent set of packages and the user might be selecting the package set as part of the setup stage. It is the user's responsibility to understand how the provider uses or does not use the information about the dependencies and ensure that is compatible with their project's needs. This is no different to what users already must do when using any of the established package managers at the moment.
The CMAKE_PROJECT_SETUP
injection point may well be useful for other scenarios. While it is used for registering dependency providers in this proposal, it may also be a logical place to register other providers or perform one-time setup injected from outside the project. The existing CMAKE_PROJECT_INCLUDE_BEFORE
and CMAKE_PROJECT_<PROJECT-NAME>_INCLUDE_BEFORE
variables don't quite achieve that because they either get used for every project()
call or they require the name of the top level project and therefore hinder re-using setup scripts across projects.
Adding a provider for FindPkgConfig
may also be something people might want to see at some point. I didn't want to tackle that in this first iteration, as it is already big and I personally feel there's more value in tackling the two that I've chosen to focus on here. The current proposal is extensible and should be able to accomodate such a direction in the future if required.
This proposal does not restrict efforts related to CPS (Common Package Specification). Should CPS gain traction, it could be incorporated as another provider type, or we might choose to translate its information into one of the existing types. We have the flexibility needed to consider different approaches.