Commit f62d301b authored by Betsy McPhail's avatar Betsy McPhail Committed by Brad King
Browse files

ctest: Optionally avoid starting tests that may exceed a given CPU load

Add a TestLoad setting to CTest that can be set via a new --test-load
command-line option, CTEST_TEST_LOAD variable, or TEST_LOAD option to
the ctest_test command.  Teach cmCTestMultiProcessHandler to measure
the CPU load and avoid starting tests that may take more than the
spare load currently available.  The expression

 <current_load> + <test_processors> <= <max-load>

must be true to start a new test.

Co-Author: Zack Galbreath <zack.galbreath@kitware.com>
parent 07c550ca
......@@ -14,6 +14,7 @@ Perform the :ref:`CTest MemCheck Step` as a :ref:`Dashboard Client`.
[EXCLUDE_LABEL <label-exclude-regex>]
[INCLUDE_LABEL <label-include-regex>]
[PARALLEL_LEVEL <level>]
[TEST_LOAD <threshold>]
[SCHEDULE_RANDOM <ON|OFF>]
[STOP_TIME <time-of-day>]
[RETURN_VALUE <result-var>]
......
......@@ -14,6 +14,7 @@ Perform the :ref:`CTest Test Step` as a :ref:`Dashboard Client`.
[EXCLUDE_LABEL <label-exclude-regex>]
[INCLUDE_LABEL <label-include-regex>]
[PARALLEL_LEVEL <level>]
[TEST_LOAD <threshold>]
[SCHEDULE_RANDOM <ON|OFF>]
[STOP_TIME <time-of-day>]
[RETURN_VALUE <result-var>]
......@@ -61,6 +62,13 @@ The options are:
Specify a positive number representing the number of tests to
be run in parallel.
``TEST_LOAD <threshold>``
While running tests in parallel, try not to start tests when they
may cause the CPU load to pass above a given threshold. If not
specified the :variable:`CTEST_TEST_LOAD` variable will be checked,
and then the ``--test-load`` command-line argument to :manual:`ctest(1)`.
See also the ``TestLoad`` setting in the :ref:`CTest Test Step`.
``SCHEDULE_RANDOM <ON|OFF>``
Launch tests in a random order. This may be useful for detecting
implicit test dependencies.
......
......@@ -386,6 +386,7 @@ Variables for CTest
/variable/CTEST_SVN_COMMAND
/variable/CTEST_SVN_OPTIONS
/variable/CTEST_SVN_UPDATE_OPTIONS
/variable/CTEST_TEST_LOAD
/variable/CTEST_TEST_TIMEOUT
/variable/CTEST_TRIGGER_SITE
/variable/CTEST_UPDATE_COMMAND
......
......@@ -66,6 +66,13 @@ Options
number of jobs. This option can also be set by setting the
environment variable ``CTEST_PARALLEL_LEVEL``.
``--test-load <level>``
While running tests in parallel (e.g. with ``-j``), try not to start
tests when they may cause the CPU load to pass above a given threshold.
When ``ctest`` is run as a `Dashboard Client`_ this sets the
``TestLoad`` option of the `CTest Test Step`_.
``-Q,--quiet``
Make ctest quiet.
......@@ -776,6 +783,13 @@ Arguments to the command may specify some of the step settings.
Configuration settings include:
``TestLoad``
While running tests in parallel (e.g. with ``-j``), try not to start
tests when they may cause the CPU load to pass above a given threshold.
* `CTest Script`_ variable: :variable:`CTEST_TEST_LOAD`
* :module:`CTest` module variable: ``CTEST_TEST_LOAD``
``TimeOut``
The default timeout for each test if not specified by the
:prop_test:`TIMEOUT` test property.
......
ctest-test-load-option
----------------------
* CTest learned to optionally measure the CPU load during parallel
testing and avoid starting tests that may cause the load to exceed
a given threshold. See the :manual:`ctest(1)` command ``--test-load``
option, the ``TestLoad`` setting of the :ref:`CTest Test Step`,
the :variable:`CTEST_TEST_LOAD` variable, and the ``TEST_LOAD``
option of the :command:`ctest_test` command.
CTEST_TEST_LOAD
---------------
Specify the ``TestLoad`` setting in the :ref:`CTest Test Step`
of a :manual:`ctest(1)` dashboard client script. This sets the
default value for the ``TEST_LOAD`` option of the :command:`ctest_test`
command.
......@@ -95,6 +95,10 @@ SlurmRunCommand: @SLURM_SRUN_COMMAND@
# Currently set to 25 minutes
TimeOut: @DART_TESTING_TIMEOUT@
# During parallel testing CTest will not start a new test if doing
# so would cause the system load to exceed this value.
TestLoad: @CTEST_TEST_LOAD@
UseLaunchers: @CTEST_USE_LAUNCHERS@
CurlOptions: @CTEST_CURL_OPTIONS@
# warning, if you add new options here that have to do with submit,
......
......@@ -23,6 +23,7 @@ cmCTestGenericHandler::cmCTestGenericHandler()
this->SubmitIndex = 0;
this->AppendXML = false;
this->Quiet = false;
this->TestLoad = 0;
}
//----------------------------------------------------------------------
......@@ -70,6 +71,7 @@ void cmCTestGenericHandler::SetPersistentOption(const std::string& op,
void cmCTestGenericHandler::Initialize()
{
this->AppendXML = false;
this->TestLoad = 0;
this->Options.clear();
t_StringToString::iterator it;
for ( it = this->PersistentOptions.begin();
......
......@@ -89,6 +89,8 @@ public:
void SetAppendXML(bool b) { this->AppendXML = b; }
void SetQuiet(bool b) { this->Quiet = b; }
bool GetQuiet() { return this->Quiet; }
void SetTestLoad(unsigned long load) { this->TestLoad = load; }
unsigned long GetTestLoad() const { return this->TestLoad; }
protected:
bool StartResultingXML(cmCTest::Part part,
......@@ -97,6 +99,7 @@ protected:
bool AppendXML;
bool Quiet;
unsigned long TestLoad;
cmSystemTools::OutputOption HandlerVerbose;
cmCTest *CTest;
t_StringToString Options;
......
......@@ -13,12 +13,15 @@
#include "cmProcess.h"
#include "cmStandardIncludes.h"
#include "cmCTest.h"
#include "cmCTestScriptHandler.h"
#include "cmSystemTools.h"
#include <stdlib.h>
#include <stack>
#include <list>
#include <float.h>
#include <math.h>
#include <cmsys/FStream.hxx>
#include <cmsys/SystemInformation.hxx>
class TestComparator
{
......@@ -40,6 +43,7 @@ private:
cmCTestMultiProcessHandler::cmCTestMultiProcessHandler()
{
this->ParallelLevel = 1;
this->TestLoad = 0;
this->Completed = 0;
this->RunningCount = 0;
this->StopTimePassed = false;
......@@ -84,6 +88,11 @@ void cmCTestMultiProcessHandler::SetParallelLevel(size_t level)
this->ParallelLevel = level < 1 ? 1 : level;
}
void cmCTestMultiProcessHandler::SetTestLoad(unsigned long load)
{
this->TestLoad = load;
}
//---------------------------------------------------------
void cmCTestMultiProcessHandler::RunTests()
{
......@@ -213,6 +222,11 @@ inline size_t cmCTestMultiProcessHandler::GetProcessorsUsed(int test)
return processors;
}
std::string cmCTestMultiProcessHandler::GetName(int test)
{
return this->Properties[test]->Name;
}
//---------------------------------------------------------
bool cmCTestMultiProcessHandler::StartTest(int test)
{
......@@ -259,6 +273,46 @@ void cmCTestMultiProcessHandler::StartNextTests()
return;
}
bool allTestsFailedTestLoadCheck = false;
bool usedFakeLoadForTesting = false;
size_t minProcessorsRequired = this->ParallelLevel;
std::string testWithMinProcessors = "";
cmsys::SystemInformation info;
unsigned long systemLoad = 0;
size_t spareLoad = 0;
if (this->TestLoad > 0)
{
// Activate possible wait.
allTestsFailedTestLoadCheck = true;
// Check for a fake load average value used in testing.
if (const char* fake_load_value =
cmSystemTools::GetEnv("__CTEST_FAKE_LOAD_AVERAGE_FOR_TESTING"))
{
usedFakeLoadForTesting = true;
if (!cmSystemTools::StringToULong(fake_load_value, &systemLoad))
{
cmSystemTools::Error("Failed to parse fake load value: ",
fake_load_value);
}
}
// If it's not set, look up the true load average.
else
{
systemLoad = static_cast<unsigned long>(ceil(info.GetLoadAverage()));
}
spareLoad = (this->TestLoad > systemLoad ?
this->TestLoad - systemLoad : 0);
// Don't start more tests than the spare load can support.
if (numToStart > spareLoad)
{
numToStart = spareLoad;
}
}
TestList copy = this->SortedTests;
for(TestList::iterator test = copy.begin(); test != copy.end(); ++test)
{
......@@ -274,18 +328,74 @@ void cmCTestMultiProcessHandler::StartNextTests()
}
size_t processors = GetProcessorsUsed(*test);
bool testLoadOk = true;
if (this->TestLoad > 0)
{
if (processors <= spareLoad)
{
cmCTestLog(this->CTest, DEBUG,
"OK to run " << GetName(*test) <<
", it requires " << processors <<
" procs & system load is: " <<
systemLoad << std::endl);
allTestsFailedTestLoadCheck = false;
}
else
{
testLoadOk = false;
}
}
if(processors <= numToStart && this->StartTest(*test))
if (processors <= minProcessorsRequired)
{
if(this->StopTimePassed)
{
return;
}
numToStart -= processors;
minProcessorsRequired = processors;
testWithMinProcessors = GetName(*test);
}
if(testLoadOk && processors <= numToStart && this->StartTest(*test))
{
if(this->StopTimePassed)
{
return;
}
numToStart -= processors;
}
else if(numToStart == 0)
{
return;
break;
}
}
if (allTestsFailedTestLoadCheck)
{
cmCTestLog(this->CTest, HANDLER_OUTPUT, "***** WAITING, ");
if (this->SerialTestRunning)
{
cmCTestLog(this->CTest, HANDLER_OUTPUT,
"Waiting for RUN_SERIAL test to finish.");
}
else
{
cmCTestLog(this->CTest, HANDLER_OUTPUT,
"System Load: " << systemLoad << ", "
"Max Allowed Load: " << this->TestLoad << ", "
"Smallest test " << testWithMinProcessors <<
" requires " << minProcessorsRequired);
}
cmCTestLog(this->CTest, HANDLER_OUTPUT, "*****" << std::endl);
if (usedFakeLoadForTesting)
{
// Break out of the infinite loop of waiting for our fake load
// to come down.
this->StopTimePassed = true;
}
else
{
// Wait between 1 and 5 seconds before trying again.
cmCTestScriptHandler::SleepInSeconds(
cmSystemTools::RandomSeed() % 5 + 1);
}
}
}
......
......@@ -37,6 +37,7 @@ public:
void SetTests(TestMap& tests, PropertiesMap& properties);
// Set the max number of tests that can be run at the same time.
void SetParallelLevel(size_t);
void SetTestLoad(unsigned long load);
virtual void RunTests();
void PrintTestList();
void PrintLabels();
......@@ -93,6 +94,7 @@ protected:
bool CheckCycles();
int FindMaxIndex();
inline size_t GetProcessorsUsed(int index);
std::string GetName(int index);
void LockResources(int index);
void UnlockResources(int index);
......@@ -116,6 +118,7 @@ protected:
std::set<std::string> LockedResources;
std::vector<cmCTestTestHandler::cmCTestTestResult>* TestResults;
size_t ParallelLevel; // max number of process that can be run at once
unsigned long TestLoad;
std::set<cmCTestRunTest*> RunningTests; // current running tests
cmCTestTestHandler * TestHandler;
cmCTest* CTest;
......
......@@ -26,6 +26,7 @@ cmCTestTestCommand::cmCTestTestCommand()
this->Arguments[ctt_PARALLEL_LEVEL] = "PARALLEL_LEVEL";
this->Arguments[ctt_SCHEDULE_RANDOM] = "SCHEDULE_RANDOM";
this->Arguments[ctt_STOP_TIME] = "STOP_TIME";
this->Arguments[ctt_TEST_LOAD] = "TEST_LOAD";
this->Arguments[ctt_LAST] = 0;
this->Last = ctt_LAST;
}
......@@ -103,6 +104,38 @@ cmCTestGenericHandler* cmCTestTestCommand::InitializeHandler()
{
this->CTest->SetStopTime(this->Values[ctt_STOP_TIME]);
}
// Test load is determined by: TEST_LOAD argument,
// or CTEST_TEST_LOAD script variable, or ctest --test-load
// command line argument... in that order.
unsigned long testLoad;
const char* ctestTestLoad
= this->Makefile->GetDefinition("CTEST_TEST_LOAD");
if(this->Values[ctt_TEST_LOAD] && *this->Values[ctt_TEST_LOAD])
{
if (!cmSystemTools::StringToULong(this->Values[ctt_TEST_LOAD], &testLoad))
{
testLoad = 0;
cmCTestLog(this->CTest, WARNING, "Invalid value for 'TEST_LOAD' : "
<< this->Values[ctt_TEST_LOAD] << std::endl);
}
}
else if(ctestTestLoad && *ctestTestLoad)
{
if (!cmSystemTools::StringToULong(ctestTestLoad, &testLoad))
{
testLoad = 0;
cmCTestLog(this->CTest, WARNING,
"Invalid value for 'CTEST_TEST_LOAD' : " <<
ctestTestLoad << std::endl);
}
}
else
{
testLoad = this->CTest->GetTestLoad();
}
handler->SetTestLoad(testLoad);
handler->SetQuiet(this->Quiet);
return handler;
}
......
......@@ -60,6 +60,7 @@ protected:
ctt_PARALLEL_LEVEL,
ctt_SCHEDULE_RANDOM,
ctt_STOP_TIME,
ctt_TEST_LOAD,
ctt_LAST
};
};
......
......@@ -1062,6 +1062,14 @@ void cmCTestTestHandler::ProcessDirectory(std::vector<std::string> &passed,
parallel->SetParallelLevel(this->CTest->GetParallelLevel());
parallel->SetTestHandler(this);
parallel->SetQuiet(this->Quiet);
if(this->TestLoad > 0)
{
parallel->SetTestLoad(this->TestLoad);
}
else
{
parallel->SetTestLoad(this->CTest->GetTestLoad());
}
*this->LogFile << "Start testing: "
<< this->CTest->CurrentTime() << std::endl
......
......@@ -294,6 +294,7 @@ cmCTest::cmCTest()
this->LabelSummary = true;
this->ParallelLevel = 1;
this->ParallelLevelSetInCli = false;
this->TestLoad = 0;
this->SubmitIndex = 0;
this->Failover = false;
this->BatchJobs = false;
......@@ -393,6 +394,11 @@ void cmCTest::SetParallelLevel(int level)
this->ParallelLevel = level < 1 ? 1 : level;
}
void cmCTest::SetTestLoad(unsigned long load)
{
this->TestLoad = load;
}
//----------------------------------------------------------------------------
bool cmCTest::ShouldCompressTestOutput()
{
......@@ -820,6 +826,20 @@ bool cmCTest::UpdateCTestConfiguration()
cmSystemTools::ChangeDirectory(this->BinaryDir);
}
this->TimeOut = atoi(this->GetCTestConfiguration("TimeOut").c_str());
std::string const& testLoad = this->GetCTestConfiguration("TestLoad");
if (!testLoad.empty())
{
unsigned long load;
if (cmSystemTools::StringToULong(testLoad.c_str(), &load))
{
this->SetTestLoad(load);
}
else
{
cmCTestLog(this, WARNING, "Invalid value for 'Test Load' : "
<< testLoad << std::endl);
}
}
if ( this->ProduceXML )
{
this->CompressXMLFiles = cmSystemTools::IsOn(
......@@ -2051,6 +2071,21 @@ bool cmCTest::HandleCommandLineArguments(size_t &i,
}
}
if(this->CheckArgument(arg, "--test-load") && i < args.size() - 1)
{
i++;
unsigned long load;
if (cmSystemTools::StringToULong(args[i].c_str(), &load))
{
this->SetTestLoad(load);
}
else
{
cmCTestLog(this, WARNING,
"Invalid value for 'Test Load' : " << args[i] << std::endl);
}
}
if(this->CheckArgument(arg, "--no-compress-output"))
{
this->CompressTestOutput = false;
......
......@@ -161,6 +161,9 @@ public:
int GetParallelLevel() { return this->ParallelLevel; }
void SetParallelLevel(int);
unsigned long GetTestLoad() { return this->TestLoad; }
void SetTestLoad(unsigned long);
/**
* Check if CTest file exists
*/
......@@ -499,6 +502,8 @@ private:
int ParallelLevel;
bool ParallelLevelSetInCli;
unsigned long TestLoad;
int CompatibilityMode;
// information for the --build-and-test options
......
......@@ -98,6 +98,7 @@ static const char * cmDocumentationOptions[][2] =
{"--test-command", "The test to run with the --build-and-test option."},
{"--test-timeout", "The time limit in seconds, internal use only."},
{"--test-load", "CPU load threshold for starting new parallel tests."},
{"--tomorrow-tag", "Nightly or experimental starts with next day tag."},
{"--ctest-config", "The configuration file used to initialize CTest state "
"when submitting dashboards."},
......
include(RunCMake)
set(RunCMake_TEST_TIMEOUT 60)
unset(ENV{CTEST_PARALLEL_LEVEL})
unset(ENV{CTEST_OUTPUT_ON_FAILURE})
......@@ -52,3 +53,35 @@ add_test(MergeOutput \"${CMAKE_COMMAND}\" -P \"${RunCMake_SOURCE_DIR}/MergeOutpu
run_cmake_command(MergeOutput ${CMAKE_CTEST_COMMAND} -V)
endfunction()
run_MergeOutput()
function(run_TestLoad name load)
set(RunCMake_TEST_BINARY_DIR ${RunCMake_BINARY_DIR}/TestLoad)
set(RunCMake_TEST_NO_CLEAN 1)
file(REMOVE_RECURSE "${RunCMake_TEST_BINARY_DIR}")
file(MAKE_DIRECTORY "${RunCMake_TEST_BINARY_DIR}")
file(WRITE "${RunCMake_TEST_BINARY_DIR}/CTestTestfile.cmake" "
add_test(TestLoad1 \"${CMAKE_COMMAND}\" -E echo \"test of --test-load\")
add_test(TestLoad2 \"${CMAKE_COMMAND}\" -E echo \"test of --test-load\")
")
run_cmake_command(${name} ${CMAKE_CTEST_COMMAND} -j2 --test-load ${load} --test-timeout 5)
endfunction()
# Tests for the --test-load feature of ctest
#
# Spoof a load average value to make these tests more reliable.
set(ENV{__CTEST_FAKE_LOAD_AVERAGE_FOR_TESTING} 5)
# Verify that new tests are not started when the load average exceeds
# our threshold.
run_TestLoad(test-load-fail 2)
# Verify that warning message is displayed but tests still start when
# an invalid argument is given.
run_TestLoad(test-load-invalid 'two')
# Verify that new tests are started when the load average falls below
# our threshold.
run_TestLoad(test-load-pass 10)
unset(ENV{__CTEST_FAKE_LOAD_AVERAGE_FOR_TESTING})
^Test project .*/Tests/RunCMake/CTestCommandLine/TestLoad
\*\*\*\*\* WAITING, System Load: 5, Max Allowed Load: 2, Smallest test TestLoad[1-2] requires 1\*\*\*\*\*
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment