CMake 3.16 orders of magnitude slower in the Generate phase for Makefiles compared to older versions
I'm consulting a company on how to speed up their builds and I immediately pointed them to precompiled headers and unity builds - the 10 minute full build could easily drop to 2-3 minutes. Luckily CMake 3.16 was recently released and it supports both of those, so I told them to upgrade.
The problem is the following: once they switched from CMake 2.6 to 3.16 the time it took to run CMake jumped from about 20 seconds to more than 10 minutes. Most of the time is spent in the generate phase. It does complete successfully if you gave it enough time and the code compiled successfully with unity builds, but this CMake time is unacceptable.
Here is their setup:
- CMake 2.6, old style CMake with global flags/defines/includes - not modern (target-based). Nothing too fancy - no custom commands or build rules & complicated dependencies.
- the compiler used is GCC 7 and they generate Makefiles - the OS is CentOS 7, Kernel: Linux 3.10.0-862.14.4.el7.x86_64
- around 2000
.cppfiles spread across 100 libraries and 600 executables (most of which are test executables with a single
.cppfiles are gathered/globbed with
aux_source_directory- we know not explicitly listing the
.cppfiles is an anti-pattern, but that's besides the point - I think this is irrelevant since this should happen in the configuration step and not during the generation, correct?
Here is what we observed:
- we did a binary search through the different CMake versions and concluded that the huge slowdown happened between 3.15 and 3.16 - exactly when the precompiled header and unity build support was added, but I don't think those features have anything to do with the slowdown - I can't think of a reason for them to have such an impact - it must be some other refactoring or change...
- we disabled all tests (that means almost all 600 executables were removed) - slimming down the number of CMake targets from 700 to a bit more than 100 - and the time it took to run CMake dropped significantly, but was still a couple of times longer than with CMake 2.6 for all the 700 targets.
- we observed what was happening using
straceand there were mostly
accesscalls along with some reads and writes - but this was endless - seemed like hundreds of operations per second in a very repeatable manner. Also there was constantly an attempt to find
libgflags.sowhich was constantly failing. Unfortunately I don't have such logs right now.
- we did a callgrind profile and here is what it looks like after a couple of minutes of running: https://i.imgur.com/Z9AObso.png (the callgrind output file can be found here) - seems like most of the time is spent in
ComputeLinkLibs()and getting the full name and definitions of targets and whatnot... Is having 700 targets too much? Is this a quadratic or exponential problem? Why isn't it an issue with CMake 3.15?
I couldn't find any reports of anyone else having the same problem on the internet... Any ideas what to try next? Perhaps profile using Perf? Try with Ninja as a backend (report of that being faster)?