benchmark: Use a more even data distribution
Instead of placing all extra non-evenly divisible work onto the last node, this updated start/end calculation distributes the extra work across the bottom several ranks.
For exampe, if you have 15 spheres and 4 ranks, the existing work distribution will be: 3, 3, 3, 7. With the new work distribution calculation it will be: 3, 4, 4, 4.
The work is distributed across the back ranks instread of the front ranks to allow rank 0 a slightly lower workload since it tends to have some extra management overhead.