Improve VTK-m cuda scheduling based on Summit scaling study
When benchmarking the VTK-m algorithms on Summit I discovered that our scheduling choices aren't optimal for the hardware.
This is a short term fix where we select good numbers for Summit, and in the future make the defaults controllable by the calling programming and/or environment variables.
Performance numbers can be found at: https://gitlab.kitware.com/snippets/755