fixes OMP reduction when using OMP_NUM_THREADS lt 4
This fixes, which where triggered since in the new CI, one of the
docker runner set OMP_NUM_THREADS=3
:
1. `UnitTestOpenMPDeviceAdapter`
2. `UnitTestMeshQualityFilter`
In the redution optimized implementation for OpenMP, it unrolls the reduce loop in iterations of four elements. The last iteration in the loop might overflow the loop end element (when it is not a multiple of four).
This commit fixes this by setting the OpenMP unrolled reduce loop end element to its previous closest multiple of four of the original end element.
Edited by Vicente Bolea