VTK-m perform cuda kernel scheduling parameter sweep studies

Capturing discussion found at: !1576 (merged)

Tasks:

Refactor GetGridsAndBlocks to call a function that will compute the 1D or 3D grid and block values. This would replace getNumSMs but use the same call_once caching strategy.
Refactor GetGridsAndBlocks 3D to use the computed values for the special cases( flat in X, small grids ).
Add in environment controls to allow for parameter sweeps. This would go into the function that replaces getNumSMs so that enviornment can control all.
Remove cuda/internal/TaskTuner.h and related code which isn't needed now that we have environment controls.
Do parameter sweep studies on all hardware we have access to, and build out a spreadsheet of values that are better for each hardware component
Add these computed values to the function that computes the 1D/3D grid and block values.

A rough idea on how we can encode the best grid and block values is (https://godbolt.org/z/8aqMWM):

enum struct GPU_ARCH { OTHER, PASCAL, VOLTA, TURING };
enum struct GPU_STRATA { ANY, CONSUMER, WORKSTATION, HPC, };

struct Presets
{
    GPU_ARCH architecture;
    GPU_STRATA strata;
    
    int one_d_blocks_per_sm;
    int one_d_grids_per_block;

    int three_d_blocks_per_sm;
    int three_d_grids_per_block[3];   
};

void BuildSchedulingPresets(std::vector<Presets>& presets)
{
    presets =  std::vector<Presets>{
            { GPU_ARCH::OTHER,   GPU_STRATA::ANY, 32,  128,  32, {4, 4, 4} },

            { GPU_ARCH::PASCAL,  GPU_STRATA::ANY, 32, 128,  32,  {4, 4, 4} },
            { GPU_ARCH::PASCAL,  GPU_STRATA::WORKSTATION, 32, 256,  32,  {8, 8, 8} },
            { GPU_ARCH::PASCAL,  GPU_STRATA::HPC, 128, 512, 128, {8, 8, 8} },

            { GPU_ARCH::VOLTA,   GPU_STRATA::ANY, 32, 128,  32,  {4, 4, 4} },
            { GPU_ARCH::VOLTA,   GPU_STRATA::WORKSTATION, 32, 256,  32,  {8, 8, 8} },
            { GPU_ARCH::VOLTA,   GPU_STRATA::HPC, 128, 512, 128, {8, 8, 8} },

            { GPU_ARCH::TURING,  GPU_STRATA::ANY, 32, 512,  21,  {16, 16, 16} },
    };
}

Edited Aug 24, 2020 by Robert Maynard