Seaworthy: Docker-based generator backend for CMake
I have had this idea in my head for a few months now and it has matured enough that I would like to start a conversation about it and see how much potential it has.
I've been doing a lot of containerization of CMake-based applications, because everybody wants containers nowadays. There are entire tomes discussing the advantages of containerization, so no need to get into that here. Just know that it is showing no sign of stopping, and if anything, the pace is picking up. Also when I say containers, this need not be Docker, though that is the first obvious implementation. Podman is promising, because no root required.
My dockerfiles usually follow this sort of flow:
FROM namespace/image:tag as base_deps
# things needed at runtime or to get source code
RUN install_application_deps
# ===
FROM base_deps as compiler_stage
# things needed to compile, e.g. build-essential
RUN install_compiler_dependencies
RUN download_and_extract_source_code
RUN cd /home/proj_source && mkdir build && cd build &&\
cmake \
-DCMAKE_BUILD_TYPE:STRING=Release \
-Dsome_mess_of_flags \
..
RUN make
# ===
FROM base_deps as final
COPY --from=compiler_stage /home/proj_source/build /home/app
And if I don't feel like being fancy or the project is really involved, I'll omit the COPY --from
and just build the compile step into the image. This usually balloons the image size though, so it's not ideal.
This is more or less the only way you can currently containerize a Cmake application. Here's some of the problems with that:
- Docker caching happens at every directive step. The
make
occurs in only oneRUN
step, so the entire build process happens in a single step - You have this conflict between both CMake/generator and Docker trying to handle caching
- This means any compile failure sets you back to square one, and unless you do some janky hacks, you lose all your built artifacts and have to rebuild from scratch
- The
cmake
andmake
steps are not always deterministic, which breaks with docker philosophy of eachRUN
being totally deterministic
Instead, what I would like to see would be a more granular approach to the compilation process. Down the road, there is some really need stuff you can do with caching, build parallelization, lazy compiling, isolation, build tree data volumes, etc, but that's far down the roadmap. The bare minimum is breaking up the RUN make
stage into something less monolithic.
The first apparent solution that comes to my mind would be to write a Cmake generator which emits dockerfiles. Obviously docker doesn't compile, so there would have to be -G docker-ninja
, -G docker-cmake
etc. Also there is information which needs to be fed in about the source image, this would probably be done with -D
flags, system architecture/ISA, and a few other technical minutia, but for the sake of scope, I'm assuming various ubuntu x64 flavors to start.
Here's a rough outline of the workflow:
cmake -G docker -B /build -S /source
cd /build
build_containers # this would call 'docker build'
The cmake step would generate the CMakeFiles as usual, CPackConfig.cmake, etc. But in addition to Makefile or *.ninja files, it would emit one or more dockerfiles, along with build_containers
, some script containing directives to docker build
or perhaps docker-compose build
.
These dockerfiles would COPY
in the build tree and RUN
incrementally each target.
You'd then still have the typical options to CPack, generate a .deb or .tar or whatever, or additionally, generate a data-only volume, which is basically:
FROM busybox
COPY --from final_build stage /build/artifacts /build/artifacts
This gives you a really lightweight and convenient way to copy files into other dockerfiles, or back onto the host operating system, without having to have all your compiler tools locally installed.
So why go through all this trouble?
- This will greatly speed up large project builds already using docker
- More deterministic builds
- Less persistent state (this is half of the Two Hard Problems of CS)
- More portability (don't need a dedicated Centos dashboard, just start with the right image. And yes there are Windows containers)
- Easy CI integration
- Better fit with container idioms (e.g. RUN is deterministic, ADD is used to grab urls which may change)
- Lighter containers, since it's easier to grab artifacts without the build cruft
- Disentangle superbuilds
I think this has the potential to be a complete gamechanger, the same way that containers change the provisioning/CI/deployment game.
P.S. still toying with the working name Seaworthy/CWorthy (obviously a pun on CWhatever, tying in the nautical/shipping container theme), so I welcome suggestions on which you prefer.