Fix the default launch sizes for Tesla hardware.
The 8x8x8 is a better launch strategy for most VTK-m kernels. The current problem is that a couple of VTK-m kernels use a high number of registers and this number of threads combines to require too many registers.
What we should do in the longer run is have more controls over kernel launches on a per kernel basis. This will require VTK-m to extract the number of registers being used by each kernel