execute_process: cmExecuteProcessCommandFixText() for UTF-8 output causes assertion failures on Windows due to isspace() usage
I accidentally caught this bug while working on !8228 (merged) .
if the child process would generate some UTF-8 output, the cmExecuteProcessCommandFixText()'s logic related to whitespace trimming would fail with the assertion error (on debug builds):
In my case, the guilty text was localized text from MSVC linker (There is a text after "decoding" it via online mojibakes decoder (in short, it complains that the entry point is not set for linker, the object files are not defined, the /MACHINE
parameter is not set, so we use the X64, and the /v
flag is not recognized):
Decoded text
Microsoft (R) Incremental Linker Version 14.39.33321.0
Copyright (C) Microsoft Corporation. All rights reserved.
LINK : warning LNK4044: нераспознанный параметр "/v"; игнорируется
LINK : warning LNK4001: не указаны объектные файлы; использованы библиотеки
LINK : warning LNK4068: параметр /MACHINE не указан; принимается по умолчанию на X64
LINK : fatal error LNK1561: точка входа должна быть определена
The root cause is this code snippet (https://gitlab.kitware.com/cmake/cmake/-/blob/1f66051983ef0bdefa5de139fa9013830a4c3047/Source/cmExecuteProcessCommand.cxx#L379):
// Fix the text in the output strings.
cmExecuteProcessCommandFixText(outputData.Output,
arguments.OutputStripTrailingWhitespace);
cmExecuteProcessCommandFixText(errorData.Output,
arguments.ErrorStripTrailingWhitespace);
It calls this function:
bool cmExecuteProcessCommandIsWhitespace(char c)
{
return (isspace(static_cast<int>(c)) || c == '\n' || c == '\r');
}
which uses the isspace()
C standard library function , which is tied to locale and, in general, don't work robustly as the locale machinery is implementation defined.
As for MSVC this function may forward the char as signed (which could be converted to signed int, in my case it was the "degrees mark" ('°'), and their code was '-80' in debugger, so the isspace()
's assertion about char ranges has been failed.
As a workaround we may add additional cast to unsigned char, and cast this to int (the old hack), but I consider to avoid the isspace() completely as it may not know about UTF-8 specific whitespace chars.
I would try to make a PR with the additional cast to unsigned char to avoid the assertion failure on debug builds on Windows.
Thanks for your attention and sorry for some bad English