CMake slows down, if variable assignment contains patterns with "@"
Hi everyone.
We've discovered an issue with CMake's handling of large string constants containing sporadic "@" characters.
If such a pattern is encountered, the command parser of CMake slows down significantly. In our case from taking half a second to taking a minute to handle the file.
This bug affects all CMake versions tested by us -- starting with 2.8.7 all the way up 3.8.0-rc1 -- both on Windows as well as Linux.
In our case, this makes a build that usually runs in 9 Minutes take 32 minutes instead.
All you need to do, to replicate this bug is to create a "CMakeLists.txt" containing the following pattern:
project(hello) set( STR "<LONG-STRING>" )
The long string is a list of Unix paths -- i.e. a mixture of letters and some punctuation separated by semicolons. To exbibit the bug, a few (but not all!) of the folders must contain an "@" character.
Attached, to this bug report, you can find 3 test files with a size of 125kB in size that can be run with cmake -H. -Bbuild
.
The three files contain the same paths with the following "minor" differences:
- One replaces all "@" characters with "_" characters. This file is handled by cmake in 0.5 seconds.
- One replace all "_" characters with "@". This file is handled by cmake in 0.75 seconds.
- One uses a mixture of a few "@" and "_" otherwise. In this case cmake suddenly needs 56 seconds to finish.
We've timed this behaviour with various string lengths as follows, to show the way in which CMake runtime scales with string length.
==== Testing 'No @ Chars' ==== 10% (12729 chars) = 0.580 secs 20% (25458 chars) = 0.583 secs 30% (38187 chars) = 0.654 secs 40% (50916 chars) = 0.572 secs 50% (63645 chars) = 0.599 secs 60% (76374 chars) = 0.595 secs 70% (89103 chars) = 0.583 secs 80% (101832 chars) = 0.569 secs 90% (114561 chars) = 0.566 secs 100% (127290 chars) = 0.576 secs ==== Testing 'No _ Chars' ==== 10% (12729 chars) = 0.573 secs 20% (25458 chars) = 0.578 secs 30% (38187 chars) = 0.591 secs 40% (50916 chars) = 0.573 secs 50% (63645 chars) = 0.600 secs 60% (76374 chars) = 0.621 secs 70% (89103 chars) = 0.691 secs 80% (101832 chars) = 0.705 secs 90% (114561 chars) = 0.717 secs 100% (127290 chars) = 0.737 secs ==== Testing 'Both @ and _' ==== 10% (12729 chars) = 0.814 secs 20% (25458 chars) = 2.121 secs 30% (38187 chars) = 4.644 secs 40% (50916 chars) = 8.384 secs 50% (63645 chars) = 13.359 secs 60% (76374 chars) = 19.609 secs 70% (89103 chars) = 27.074 secs 80% (101832 chars) = 35.744 secs 90% (114561 chars) = 45.679 secs 100% (127290 chars) = 56.907 secs
We've compiled cmake 3.8.0 with profiling data and received the following gprof analysis, showing which call is slowed down:
% cumulative self self total time seconds seconds calls s/call s/call name 83.58 92.99 92.99 303315 0.00 0.00 yy_get_previous_state(void*) 16.29 111.11 18.12 309471 0.00 0.00 yy_get_next_buffer(void*) 0.01 111.15 0.01 81098 0.00 0.00 cmake::GetVariableWatch()
The call sequence for that time-consuming method is:
% time self children called name 0.00 111.11 39804/39804 cmCommandArgument_yyparse(void*) [7] 99.9 0.00 111.11 39804 cmCommandArgument_yylex(cmCommandArgumentParserHelper::ParserType*, void*) [9] 92.99 0.00 303315/303315 yy_get_previous_state(void*) [10] 18.12 0.00 309471/309471 yy_get_next_buffer(void*) [16] 0.00 0.00 14352/18451 cmCommandArgumentParserHelper::AllocateParserType(cmCommandArgumentParserHelper::ParserType*, char const*, int) [4878] 0.00 0.00 6156/6156 cmCommandArgument_yyensure_buffer_stack(void*) [5169] 0.00 0.00 6156/6156 cmCommandArgument_yy_create_buffer(_IO_FILE*, int, void*) [5165] 0.00 0.00 6156/18468 cmCommandArgument_yy_load_buffer_state(void*) [4877] 0.00 0.00 4099/4099 cmCommandArgumentParserHelper::HandleEscapeSymbol(cmCommandArgumentParserHelper::ParserType*, char) [5275]
This shows, that most of the time is spent in the command argument parser.
To replicate this issue, you can find a TGZ file attached to this bug report that contains the requisite "CMakeLists.txt" files that exhibit that issue.
- The folder "adds" contains the file that contains many "@" characters.
- The folder "underscores" contains the file that contains no "@" characters.
- The folder "adds+underscores" contains exactly 5 "@" characters and exhibits the bug.