Add Link Database Generation
Compilation Database
JSON Compilation Database is a great format that is used in software analysis tools to replay compilations independently of the build system, for example in Clang and IDEs. CMake can generate compilation database. Compilation database is a JSON file compile_commands.json
that looks like that:
[
{ "directory": "/home/user/llvm/build",
"command": "/usr/bin/clang++ -Irelative -DSOMEDEF=5 -c -o file.o file.cc",
"file": "file.cc" },
...
]
Problem
However some tools also need information about other compilation steps such as linking. Suppose a called function is defined in another compilation unit.
a.cpp:
int f(int a) {
return g(a);
}
x.cpp:
int g(int a) {
return a + 1;
}
y.cpp:
int g(int a) {
return a - 1;
}
A tool may need to know which function is actually called, but this information is only available at linking step as soon as there may be several functions with same name in different units. There are some examples that can use this knowledge:
- Unit testing tools which use code analysis need to understand which function is called in order to get correct test result.
- IDEs need it to understand which file to go when you click "Go to definition" on a function call.
Information of which function is called is only one of the examples which show that compilation database is not enough for modern software analysis. It also doesn't contain information about libraries used, exported functions and other linking flags.
Solution
As long as linking flags, exported functions, used libraries and compilation units inside executables and libraries can be described in a linkage command, we propose to generate Link Database. A simple example of link_commands.json
:
[
{
"command": "/usr/bin/ar qc libcmstd.a CMakeFiles/cmstd.dir/cm/bits/fs_path.cxx.o CMakeFiles/cmstd.dir/cm/bits/string_view.cxx.o",
"directory": "/home/myuser/cmake/build/Utilities/std",
"files": [ "/home/myuser/cmake/build/Utilities/std/CMakeFiles/cmstd.dir/cm/bits/fs_path.cxx.o", "/home/myuser/cmake/build/Utilities/std/CMakeFiles/cmstd.dir/cm/bits/string_view.cxx.o"
]
},
{
"command": "/usr/bin/cc -Wcast-align -Werror-implicit-function-declaration -Wchar-subscripts -Wall -W -Wpointer-arith -Wwrite-strings -Wformat-security -Wmissing-format-attribute -fno-common -Wundef CMakeFiles/pseudonl_purify.dir/ret0.c.o -o purify",
"directory": "/home/myuser/cmake/build/Tests/CMakeLib/PseudoMemcheck/NoLog",
"files": [
"/home/myuser/cmake/build/Tests/CMakeLib/PseudoMemcheck/NoLog/CMakeFiles/pseudonl_purify.dir/ret0.c.o"
]
},
...
]
This file can contain all the linking commands for creating libraries and executables. In the "files" field all the object files and static libraries needed for linking may be described. Software analysis tools can benefit from such format.
Current State
We already implemented generating Link Database for different build systems including CMake https://github.com/Software-Analysis-Team/CMake/tree/develop and would be happy to merge this work in CMake master. It was successfully tested for several big open source projects such as XGBoost, LLVM, GoogleTest and 10 other projects.
There are still some problems (which may be not the full list):
- Sometimes it is hard to find a full path to shared library.
- Some files are downloaded from internet, so it is harder to analyze them. However Link Database is still valuable for software analysis.
Note that there is no need to provide a guarantee that files are in a right sequence of commands. For example, you may get:
[
{
"command": "clang -o exe some_lib.a",
"directory": "/home/myuser/work",
"files": [
"some_lib.a"
]
},
{
"command": "ar qc -o some_lib.a some_object.o",
"directory": "/home/myuser/work",
"files": [
"some_object.o"
]
}
]
In this example you need to execute the second command before the first, it's ok.
Please, feel free to share any thoughts of usefulness of such a feature in CMake code and ideas on how Link Database can be improved.