Add abstraction to specify source file encoding for compilers
The encoding of source files can currently only be enforced by manually adding them to the compiler definition.
It would be nice to have sth. like this:
set(CMAKE_SOURCE_CHARSET UTF-8)
Currently it could be that the compilers do not handle the source files as UTF-8, see here (Section UTF-8 support
):
MSVC, EDG, Clang, GCC support compiling UTF-8 source files.
- Currently (Clang 11), Clang only support UTF-8 and assume all files are UTF-8. BOMs are ignored.
- GCC supports UTF-8 through iconv and the command line flag -finput-charset=UTF-8 can be used to interpret source files as UTF-8. The default encoding is inferred from the environment and fallbacks to UTF-8 when not possible. BOMs are ignored.
- MSVC supports UTF-8 source files with the /source-charset:UTF-8 command line flag. MSVC uses UTF-8 by default when a BOM is present.
For MSVC this will be already set e.g. in the VCPKG toolchain or within LLVM. As you can see there: There may exception when this option doesn't work (to old MSVC) - and this is just for MSVC. Not sure about the other compiler (e.g. how Intel, Borland, ... are handling this). So it would be nice if there is a CMake option so not each project has to research it themself (in case you set a source charset which is not supported, CMake should warn about this).