[WIP] Add support for automatically transposing large component arrays to cuda.
When encountering arrays which are vtkm::Vec of 5 or more components lets split them into multiple arrays on the cuda device to promote better access patterns.
The length of 5 was chosen as texture memory fetches are used for vectors of length 1 through 4 already.